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STATEMENT  OF  WORK 


Many  Navy  applications  in  the  area  of  personnel  assignment  demand  com¬ 
puter  hardware  several  orders  of  magnitude  faster  than  the  fastest  machines 
available.  Computer  designers  (including  Seymour  Cray)  have  turned  to  par¬ 
allelism  as  one  of  the  more  promising  avenues  for  increased  computational 
speed.  Parallelism  shifts  some  of  the  burden  for  increased  speed  from  the 
hardware  to  the  software  engineer.  Very  powerful  hardware  (in  terms  of  mil¬ 
lions  of  floating  point  operations  per  second)  can  be  built  using  many  low'  cost 
standard  chips,  all  designed  to  operate  in  parallel.  Our  research  program  ob¬ 
jective  is  to  develop  and  empirically  test  new  parallel  algorithms  and  software 
for  a  wide  variety  of  optimization  problems.  The  problems  studied  this  past 
year  include  the  the  mimimal  spanning  tree  problem,  the  shortest  path  prob¬ 
lem.  the  dense  assignment  problem,  and  the  generalized  network  model.  Par¬ 
allel  algorithms  were  developed  and  implemented  on  a  Sequent  Symmetry  S81 
multicomputer  having  twenty  computational  processing  units  (cpu’s).  We  also 
assisted  the  Navy  Personnel  Research  and  Development  Center  in  developing 
and  analyzing  readiness  models. 

ACCOMPLISHMENTS  AND  PUBLICATIONS 

The  following  is  a  list  of  the  papers  completed  during  the  last  year,  an  execu¬ 
tive  summary  of  the  paper,  and  the  publication  status.  These  four  papers  also 
appear  in  this  document. 

Title 

Minimal  Spanning  Trees:  An  Empirical  Investigation  of  Parallel  Algorithms, 
Technical  Report  87-OR-02 

Authors 

R.  S.  Barr,  R.  V.  Helgason,  and  J.  L.  Kennington 

Executive  Summary 

This  was  our  first  empirical  investigation  using  the  Sequent  Symmetry  S81 
multicomputer  for  parallel  processing  research.  This  problem  was  selected 
because  the  algorithms  for  sequential  machines  are  straightforward  and  we 
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could  concentrate  our  efforts  on  parallelization  and  learning  the  idiosyncrasies 
of  the  S81  system.  We  achieved  speedups  which  ranged  from  a  low  of  2.79  to 
a  high  of  6.81  using  ten  processors. 

Publication  Status 

This  paper  has  been  accepted  for  publication  in  Parallel  Computing. 


Title 

Dijkstra’s  Two-tree  Shortest  Path  Algorithm,  Technical  Report  88-OR-13 
Authors 

R.  V.  Helgason,  J.  L.  Kennington,  and  B.  D.  Stewart 
Executive  Summary 

The  problem  of  finding  the  shortest  path  between  a  designated  pair  of  nodes 
in  a  graph  is  a  fundamental  problem  in  operations  research  which  also  serves 
as  a  building  block  for  other  algorithms  such  as  the  out-of-kilter  algorithm 
and  the  shortest  augmenting  path  algorithm.  The  classical  Dijkstra  algorithm 
begins  at  one  of  the  designated  nodes  and  fans  out  from  this  node  until  the 
other  designated  node  becomes  a  member  of  the  labeled  set.  In  this  investiga¬ 
tion  we  empirically  demonstrate  that  a  better  algorithm  is  obtained  by  a  proce¬ 
dure  that  begins  at  both  designated  nodes  and  fans  out  in  both  directions. 
This  is  accomplished  mathematically  by  building  trees  rooted  at  the  two  desig¬ 
nated  nodes.  The  algorithm  stops  when  any  node  appears  in  both  trees. 

Publication  Status 

Additional  empirical  tests  have  been  performed  on  a  wide  variety  of  problems 
and  the  paper  is  being  revised  to  reflect  this  analysis.  We  anticipate  submit¬ 
ting  the  revised  paper  for  publication  by  the  end  of  August. 
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Title 


An  Empirical  Analysis  of  the  Dense  Assignment  Problem,  Technical  Report 
88-OR-16 

Authors 

J.  Kennington  and  Z.  Wang 
Executive  Summary 

There  are  three  methods  for  comparing  algorithms,  (i)  worst  case  analysis,  (ii) 
average  case  analysis,  and  (iii)  empirical  analysis.  Each  technique  has  its 
strengths  and  each  has  its  weaknesses.  We  performed  a  thorough  empirical 
analysis  comparing  the  auction  algorithm  with  the  shortest  augmenting  path 
algorithm  (SAP)  for  the  dense  assignment  problem.  We  found  that  the  soft¬ 
ware  implementation  of  the  SAP  algorithm  was  superior  on  serial  machines. 
A  parallel  implementation  of  this  software  yielded  speedups  of  four  using  ten 
processors.  It  appears  to  us  that  this  problem  is  solved.  We  successfully 
solved  problems  having  over  one  million  arcs  in  less  than  20  seconds  on  a 
Sequent  Symmetry  S81. 

Publication  Status 

This  paper  has  been  submitted  for  publication  and  is  currently  under  review. 


Title 

Solving  Generalized  Network  Problems  on  a  Shared  Memory  Multiprocessor, 
Technical  Report  88-OR-21 

Authors 

J.  Kennington  and  R.  Muthukrishnan 
Executive  Summary 

The  generalized  network  problem  in  its  most  general  form  is  a  special  case  of 
a  linear  program  in  which  every  column  of  the  constraint  matrix  has  at  most 
two  nonzero  elements.  Special  cases  of  this  model  include  the  flow  with  gains 
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problem,  the  minimal  cost  network  flow  problem,  the  transportation  problem, 
the  assignment  problem,  the  maximal  flow  problem,  and  the  shortest  path 
problem.  In  this  study  we  empirically  demonstrate  that  specialized  software 
for  this  model  is  an  order  of  magnitude  faster  than  MPSX.  This  algorithm 
was  parallelized  and  speedups  of  from  two  to  three  were  achieved  on  a  Se¬ 
quent  Symmetry  S8 1  multicomputer  using  eight  processors. 

Publication  Status 

Professor  Robert  Meyer  and  his  Ph.D.  student  Robert  Clark  of  the  University 
of  Wisconsin  have  independently  and  simultaneously  been  working  on  a  paral¬ 
lelization  of  the  simplex  algorithm  for  the  generalized  network  problem.  We 
are  combining  forces  with  this  group  to  combine  our  work  presented  in  this 
paper  with  their  w'ork  in  an  attempt  to  write  the  definitive  paper  on  the  sub¬ 
ject.  Professor  Kennington  visited  the  University  of  Wisconsin  in  June  1989, 
and  we  expect  to  complete  this  joint  paper  by  the  end  of  August  1989. 


PERSONNEL 

Faculty 

Jeffery  L.  Kennington 

Richard  V.  Helgason 

Ph.D  Students 

Muthukrishnan  Ramamurti 
Ph.D.  received  May  1989 

Dissertation  Title:  Parallel  Algorithms  for  Generalized  Networks 

Levent  Hatay, 

D.  Eng.  received  May  1989 

Praxis  Title:  Sequential  and  Parallel  Algorithms  for  the 
Shortest-Path  Problem 

Kumar  Thiagarajan 
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Zhiming  Wang 
Betty  Hickman 


PRESENTATIONS 

Denver  ORSA/TIMS  Fall  1988 

An  Asynchronous  Algorithm  to  Solve  Generalized  Network  Problems,  J.  Ken- 
nington  &  R.  Muthukrishnan 

Hierarchical  Graph  Partitioning  for  MIMD  Computers,  R.  Barr,  R.  Helgason, 
D.  Matula,  &  K.  Thiagarajan 

The  Transportation  Problem:  A  Shortest  Augmenting  Path  Algorithm,  Z. 
Wang 


Vancouver  CORS/TIMS/ORSA  Spring  1989 

Network  Flow  Problems:  Parallel  Algorithms  and  Computational  Experience, 
J.  Kennington 

Identifying  the  Vertices  of  the  Convex  Hull  of  a  Finite  Set  of  Points,  J.  Dula, 
R.  Helgason,  &  K.  Thiagarajan 

Solving  Assignment  Problems  on  a  Shared  Memory  Multiprocessor,  Z.  Wang 

Parallelization  Strategies  for  the  Network  Simplex  Algorithm,  B.  Hickman  & 
R.  Barr 


INTERACTION  WITH  NAVY  PERSONNEL  RESEARCH 
AND  DEVELOPMENT  CENTER 


The  Navy  Personnel  Research  and  Development  Center  located  at  San  Diego, 
California  is  a  co-sponsor  of  this  research.  Professor  Kennington  met  with 
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Dr.  Iosef  Krass  at  the  Denver  ORS A/TIMS  Meeting  and  with  Mr.  Ted 
Thompson  and  Mr.  Joe  Blanco  at  the  Vancouver  CORS/ORSA/TIMS  Meet¬ 
ing.  Professor  Kennington  visited  NPRDC  during  the  summer  of  1988  and 
gave  a  seminar  on  parallel  processing  to  the  operations  research  group.  We 
have  also  had  many  phone  conversations  with  both  Dr.  Timothy  Liang,  Mr. 
Ted  Thompson,  and  Mr.  Alan  Whisman  regarding  several  of  their  models. 
Our  main  goal  in  this  work  is  basic  research,  but  we  also  serve  NPRDC  as 
technical  advisors  and  provide  them  with  specialized  network  software  that 
our  group  has  developed  over  the  years. 
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Technical  Report  87-OR-02 


MINIMAL  SPANNING  TREES: 

AN  EMPIRICAL  INVESTIGATION  OF  PARALLEL 

ALGORITHMS 


R.S.  Barr 
R.V.  Helgaon 
and 

J.L.  Kennington 


Department  of  Operations  Research  and  Engineering  Management 
School  of  Engineering  and  Applied  Science 
Southern  Methodist  University 
Dallas,  Texas  75275 

revised  March  1989 


Comments  and  criticisms  from  interested  readers  are  cordially  invited. 


ABSTRACT 


The  objective  of  this  investigation  is  to  empirically  test  parallel  algorithms  for  finding 
minimal  spanning  trees.  Computational  tests  were  conducted  on  three  serial  versions  of 
Prim’s  algorithm,  a  serial  version  of  Kruskal’s  algorithm,  and  a  serial  version  of  the 
Sollin's  algoritl.u.  For  complete  graphs,  our  implementation  of  the  Prim  algorithm  is 
best.  \s  the  graph  density  is  reduced,  our  implementation  of  Kruskal’s  algorithm  is 
superior,  and  for  very  sparse  graphs,  the  Sollin  algorithm  dominates.  Parallel  implemen¬ 
tations  of  both  the  Prim  algorithm  and  the  Sollin  algorithm  were  empirically  tested  on  a 
Sequent  Symmetry  SSI  multicomputer.  In  tests  involving  ten  processors,  the  speedups 
ranged  from  a  low  of  2.79  to  a  high  of  6.81. 
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I.  INTRODUCTION 


The  new  generation  of  computing  hardware  based  on  parallel  processing  technology 
requires  that  algorithm  engineers  redesign  and  reevaluate  the  standard  methods.  It  may 
well  be  that  algorithms  which  proved  to  be  superior  for  single-processor  machines  may 
prove  to  be  in^.iior  in  some  of  the  new  parallel  processing  environments.  One  of  the 
more  popular  new  parallel  machines  is  Sequent  Computer  Systems’  Symmetry  S81.  The 
objective  of  this  investigation  is  to  computationally  test  parallel  algorithms  for  finding 
minimal  spanning  trees  on  a  twelve-processor  Symmetry  S81. 

An  undirected  graph  G  =  [V,E]  consists  of  a  vertex  set  V  and  an  edge  set  E.  Without 
loss  of  generality  we  assume  that  the  edges  are  distinct.  If  G'  =  [V,  E']  is  a  subgraph  of 
G,  then  G  is  called  a  spanning  subgraph  for  G.  If,  in  addition,  G'  is  a  tree,  then  G'  is 
called  a  spannir,,  tree  for  G.  A  graph  whose  components  are  trees  is  called  a  forest,  and 
a  spanning  subgraph  for  G,  which  is  also  a  forest,  is  called  a  spanning  forest  for  G.  We 
will  call  {[{Ui},  O]  :  Uj  E  V)  the  trivial  spanning  forest  for  G  and  each  [{Ui).  3>1  a  trivial 
tree.  Associated  with  each  edge  (u,v)  is  a  real-valued  cost  c(u,v).  The  minimum  span¬ 
ning  tree  problem  may  be  stated  as  follows:  Given  a  connected  undirected  graph  each  of 
whose  edges  has  a  real-value  cost,  find  a  spanning  tree  of  the  graph  whose  total  edge  cost 
is  minimum. 

Applications  (see  Christofides  [4])  include  the  design  of  a  distribution  network  in 
which  the  nodes  represent  cities  or  towns  and  the  edges  represent  electrical  power  lines, 
water  lines,  natural  gas  lines,  communication  links,  etc.  The  objective  is  to  design  a 
network  which  uses  the  least  length  of  cable  or  pipe.  The  minimum  spanning  tree  prob¬ 
lem  is  also  used  as  a  subproblem  for  algorithms  for  the  traveling  salesman  problem  (see 
Held  and  Karp  {6,  7]  and  Ali  and  Kennington  [3]).  Some  vehicle  routing  algorithms 
require  the  solution  of  a  traveling  salesman  problem  on  a  subset  of  nodes.  Hence,  a  wide 
variety  of  applications  require  the  solution  of  minimal  spanning  trees.  Some  applications 
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require  a  single  solution  and  some  use  the  model  as  a  subproblem  within  another  algo¬ 
rithm. 
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II.  THREE  CLASSICAL  ALGORITHMS 

The  algorithms  in  current  use  may  be  traced  to  ideas  developed  by  Prim,  Kruskal,  and 
Sollin.  These  three  classical  algorithms  all  begin  with  the  trivial  spanning  forest 
Go  =  {[Vi,  Tj],  i  =  0,  .  .  .  ,  |V|  -  1}.  A  sequence  of  spanning  forests  is  obtained  by  merging 
spanning  forest  components.  Given  spanning  forest  Gk,  a  nonforest  edge  (u,v)  is  selected 
and  the  components  [Vit  Tj]  and  [Vjt  Tj]  with  u6  V,  and  v  G  Vj  are  removed  from  Gk 
and  replaced  by  [V/,  T/],  where  i  =  k  +  | VJ,  V/  =  Vj  [J  Vj,  and  T^  =  Tj  [J  Tj  [J  {(u,v)}, 
yielding  spanning  forest  Gk+i.  After  m  =  [ V[— 1  edges  have  been  selected, 

Gm=  {[V2m,  T2m]}  is  a  minimal  spanning  tree  for  G. 

Let  [V,,  Tj]  and  [Vj,  Tj]  denote  two  disjoint  subtrees  of  G  and  define  the  shortest  dis¬ 
tance  between  the  trees  by  dy  =  min  {c(u,v):  (u,v)GE,u6  Vj,v6  Vj).  Let 
Ty  =  {(u,  v)  :  (u,  v)GE,uGVi,v£  Vj,  c(u,  v)  =  dy}.  The  nonforest  edge  selected  must 
be  an  element  of  Tjj  for  some  (i,j)  pair.  The  three  classical  algorithms  may  be  viewed  as 
using  different  selection  methods  for  the  nonforest  edge  from  among  all  Ty . 

In  Prim’s  algorithm,  the  nonforest  edge  (u,v)  for  Gk  is  always  selected  so  that 
(u,v)G  Vj  x  Vj.,  where  j*  is  the  largest  index  j  such  that  [Vj, Tj]  is  a  component  of  Gk. 
Thus  a  single  component  continues  to  grow'  as  trivial  trees  disappear.  An  excellent  de¬ 
scription  of  Prim’s  algorithm  is  given  n:  Papadimitriou  and  Steiglitz  [15,  p.  273],  along 
with  its  (serial)  computational  complexity  of  0(|V|2).  It  is  believed  that  this  algorithm  is 
best  suited  for  dense  graphs. 

In  the  Sollin  algorithm,  the  nonforest  edge  (u,v)  for  Gk  is  always  selected  so  that 
(u,v)E  Vj.  x  Vj,  where  i*  is  the  smallest  index  i  such  that  [Vj,  Tj]  is  a  component  of  Gk. 
Thus  a  variety  of  different-sized  components  may  be  produced  as  the  algorithm  proceeds. 
All  trivial  trees  will  be  removed  first  in  the  early  stages  of  this  algorithm.  A  description 
of  the  Sollin  algorithm  is  given  in  Papadimitriou  and  Steiglitz  [15,  p.  277],  along  with  its 
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(serial)  computational  complexity  of  0(|Ej  log  |V|).  This  algorithm  appears  to  be  best 
suited  for  sparse  graphs. 

Kruskal’s  method  may  be  viewed  as  an  application  of  the  greedy  algorithm.  The 
minimum  spanning  tree  is  constructed  by  examining  the  edges  in  order  of  increasing  cost. 
If  an  edge  forms  a  cycle  within  a  component  of  G^,  it  is  discarded.  Otherwise,  it  is 
selected  and  yields  Gk+i.  Here  also  different-sized  components  may  be  produced.  A 
description  of  Kruskal’s  algorithm  is  given  in  Sedgewick  [18,  pp.  412-413],  along  with  its 
(serial)  computational  complexity  of  0(|E|  log  |E|). 
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III.  COMPUTATIONAL  RESULTS  WITH  SEQUENTIAL  ALGORITHMS 

Computer  codes  for  the  SoIIin  algorithm,  Kruskal’s  algorithm,  and  three  versions  of 
Prim’s  algorithm  were  developed.  For  the  code  SPARSE  PRIM,  additional  data  lists  are 
utilized  so  that  the  sets  F[u]  =  {(u,v):  (u,v)EE}  and  B[v]  =  {(u,v):  (u,v)EE}  can  be  quickly 
determined  for  all  u,vEV.  This  is  called  maintaining  the  edge  data  in  both  forward  and 
backward  star  format.  DENSE  PRIM  maintains  the  edge  data  in  an  |V|  x  |V|  matrix. 
HEAP  PRIM  maintains  the  edge  data  in  both  forward  and  backward  star  format  and 
makes  use  of  a  d-heap  as  described  in  Tarjan  [19,  p.  77].  KRUSKAL  makes  use  of  a 
partial  quicksort  as  described  in  [1,  8]  to  produce  the  least-cost  remaining  edge.  SOLLIN 
is  a  straightforward  implementation  of  the  algorithm  presented  in  [15]. 

The  five  codes  were  tested  on  randomly  generated  graphs  whose  density  varied  from 
100%  (complete  graphs)  down  to  0.5%.  All  costs  were  uniformly  distributed  on  the 
interval  [0,  maxcost]  for  different  values  of  maxcost.  All  codes  are  written  in  FORTRAN 
for  the  Sequent  Symmetry  S81  (Rev.  A). 

The  computational  results  for  high  density  graphs  are  presented  in  Tables  1  and  2.  In 
both  tables  the  times  are  the  total  seconds  required  to  solve  three  problems  (excluding 
input  and  output).  The  cost  range  for  the  problems  in  Table  1  is  0  to  10,000  while  the 
cost  range  for  those  in  Table  2  is  0  to  100,000.  For  both  tables  DENSE  PRIM  was  best 
for  problems  having  densities  of  100%  and  80%.  However,  as  the  density  was  reduced, 
KRUSKAL  was  the  best  followed  closely  by  DENSE  PRIM  and  SPARSE  PRIM.  None  of 
the  five  codes  were  sensitive  to  the  cost  range  with  total  times  approximately  the  same  for 
both  tables.  SOLLIN  was  the  clear  loser  for  all  of  these  experiments  which  is  consistent 
with  its  worst  case  computational  complexity. 

The  computational  results  for  low  density  (5%  to  20%)  random  graphs  is  presented  in 
Table  3.  KRUSKAL  remained  the  clear  winner,  but  the  ranking  for  second  through  fifth 
changed.  SOLLIN  performs  much  better  as  the  density  decreases,  and  DENSE  PRIM 
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performs  much  worse.  KRUSKAL  is  very  robust,  performing  well  over  a  wide  range  of 
problem  densities. 

Computational  runs  for  very  sparse  problems  may  be  found  in  Table  4.  For  these 
runs  SOLLIN  dominated  with  HEAP  PRIM  placing  second.  This  is  consistent  with  the 
worst-case  complexity  analysis  of  the  Sollin  algorithm. 

Tables  1,  2,  3,  4  About  Here 
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IV.  PARALLEL  ALGORITHMS 

Parallel  versions  of  the  three  classical  algorithms  have  appeared  in  the  literature  (see 
[2,  5,  9,  10,  11,  12,  14,  16,  17]),  however;  no  computation  experience  has  been  reported. 
The  overhead  required  for  coordinating  the  work  of  multiple  processors  can  only  be  deter¬ 
mined  by  actual  implementation  and  empirical  investigation  on  a  parallel  processing  ma¬ 
chine.  The  Sequent  micro-tasking  software  facilitates  the  parallel  execution  of  subrou¬ 
tines.  The  details  regarding  parallel  processing  on  Sequent  machines  may  be  found  in 
Osterhaug  [1986]. 

Prim’s  Algorithm 

Let  dd[v,w]  denote  the  distance  from  node  v  to  node  w  and  let  d[v]^  v  denote  the 
node  nearest  node  v.  Prim’s  algorithm  requires  | V|— 1  iterations.  At  iteration  k,  the 
DENSE  PRIM  code  executes  the  two  modules  which  follow'. 

Module  MIN 

0.  a  «-  oe 

1.  for  i  =  1  to  |V|-k 

2.  v«-  status  [i] ; 

3.  if  dd[v,d[v]]  <  a  ,  then 

4.  a*~  dd[v,d[v]]; 

5.  m*-i; 

6.  end  if 

7.  end  for 
and 
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Module  UPDATE 


1.  for  i  =  1  to  jVj-k 

2.  v*-  status  [i] ; 

3.  if  dd[v,  d[v]]  >  dd[v,  statusfm]],  then  d[v]«-  status [m]; 

4.  end  for. 

Suppose  there  are  p  processes  and  id  gives  the  identification  number  (0,  1,  .  .  p-1)  of 

each  of  the  processes.  The  MIN  module  was  executed  simultaneously  by  the  processes  by 
modifying  step  1  to  be 

for  i  =  id+1  to  |V|-k  step  p 

followed  by  setting  the  global  minimum  to  the  smallest  of  the  p  local  minima.  The 
UPDATE  module  involves  no  dependencies  and  can  be  parallelized  by  the  same  modifica¬ 
tion  to  step  1.  A  similar  parallel  algorithm  has  been  described  by  Deo  and  Yoo  [5]. 

The  computational  experience  for  a  parallel  version  of  DENSE  PRIM  is  presented  in 
Table  5.  The  first  five  rows  of  Table  5  correspond  to  the  five  sequential  codes.  For  test 
problem  #1,  KRUSKAL  was  best  with  a  time  of  4.05  seconds  while  for  test  problem  #2 
DENSE  PRIM  was  best.  The  speedup  is  calculated  by  dividing  the  best  sequential  time  by 
the  parallel  time.  The  sixth  row,  which  gives  the  PARALLEL  DENSE  PRIM  code  run 
with  a  single  processor,  provides  a  measure  of  the  overhead  for  running  this  algorithm  in 
parallel.  The  overhead  is  approximately  22%  for  test  problem  #1,  and  20%  for  test 
problem  #2.  For  the  ten  processor  runs  the  speedups  were  2.79  and  4.39,  respectively. 


Table  5  About  Here 
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The  Sollin  algorithm  performed  best  on  sparse  graphs,  hence  we  studied  a  paralleli¬ 
zation  of  the  Sollin  algorithm  for  grid  graphs.  The  most  time-consuming  component  of 
the  Sollin’s  sequential  algorithm  may  be  described  by  the  following  procedure: 

for  all  (u,v)£E: 

let  i  and  j  denote  the  subtrees  containing  u  and  v,  respectively; 
if  i  *  j  then 

if  c[u,v]  <  d[i],  then  d[i]  *-  c[u,v]; 
if  c[u,v]  <  d[j],  then  d[jj  *-  c[u,v]; 
end  if 
end  for. 

That  is,  all  the  edge  costs  must  be  examined  and  certain  subtree  data  are  updated.  Our 
parallelization  of  this  scan  relies  upon  a  partitioning  of  the  grid  into  p  components  (one 
for  each  processor).  A  three-processor  partitioning  of  a  7  x  7  grid  network  is  illustrated 
in  Figure  1. 

Figure  1  About  Here 

The  above  edge  scan  is  performed  in  two  stages.  The  first  stage  performs  a  parallel 
scan  over  edges  both  of  whose  vertices  lie  within  the  same  partition.  The  second  stage 
performs  a  parallel  scan  over  edges  across  cut  sets.  If  each  partition  consists  of  at  least 
two  rows  of  the  grid,  then  all  subtree  data  updating  can  be  performed  independently 
without  the  requirement  of  a  lock. 


The  second  part  of  the  Sollin  algorithm  is  to  merge  two  subtrees  by  appending  a  new 
edge.  The  merger  of  subtrees,  both  of  which  lie  in  the  same  partition  can  also  be  exe¬ 
cuted  in  parallel.  A  related  algorithm  may  be  found  in  Quinn  [17]. 

The  computational  experience  with  the  parallel  version  of  the  Sollin  algorithm  for  grid 
graphs  may  be  found  in  Table  6.  For  these  two  test  problems,  the  overhead  for  parallel 
processing  was  only  4%.  The  speedups  for  the  ten  processor  runs  were  5.89  and  6.81. 

Table  6  About  Here 
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V.  SUMMARY  AND  CONCLUSIONS 

Five  computer  codes  were  developed  to  solve  the  minimum  spanning  tree  problem  on 
a  sequential  machine.  These  codes  were  computationally  compared  on  random  graphs 
whose  densities  varied  from  0.5%  to  100%.  An  implementation  of  Prim’s  algorithm  was 
best  for  100%  dense  problems  and  an  implementation  of  the  Sollin  algorithm  was  best  for 
sparse  problems.  None  of  the  codes  were  sensitive  to  the  cost  range.  Kruskal’s  algorithm 
using  a  modification  of  a  quicksort  was  the  most  robust  of  all  implementations,  w-orking 
very  well  on  a  wide  variety  of  problems.  Unfortunately,  a  quicksort  is  difficult  to  parallel¬ 
ize.  Both  the  DENSE  PRIM  code  and  the  SOLLIN  code  were  parallelized  by  the  method 
of  data  partitioning  (homogeneous  multitasking),  yielding  ten-processor  speedups  of  2.79 
up  to  6.81. 
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Figure  1.  A  Three  Processor  Partitioning  of  a 
7x7  Grid  Graph. 
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Table  1.  Comparison  of  Sequential  Codes  on  High  Density  Random  Graphs 
Having  a  Cost  Range  of  0  to  10,000 
(times  are  total  seconds  to  solve  three  problems) 


Table  2.  Comparison  of  Sequential  Codes  on  High  Density  Random  Graphs 
Having  a  Cost  Range  of  0  to  100,000 
(times  are  total  seconds  to  solve  three  problems) 


Graph 

SPARSE 

DENSE 

HEAP 

Vertices 

Density 

Edges 

PRIM 

PRIM 

PRIM 

SOLLIN 

KRUSKAL 

100% 

1.19 

0.90 

1.55 

3.04 

1.17 

80% 

15920 

1.00 

0.90 

1.31 

2.29 

0.96 

60% 

11940 

0.81 

0.90 

1.07 

1.72 

0.73 

40% 

7960 

0.62 

0.90 

0.81 

1.07 

0.67 

400 

100% 

79800 

4.91 

3.66 

5.88 

11.37 

4.38 

400 

80% 

4.14 

3.67 

4.93 

10.36 

2.70 

400 

60% 

47880 

3.36 

3.66 

3.92 

7.89 

2.69 

400 

40% 

31920 

2.58 

3.67 

2.87 

4.64 

1.79 

600 

100% 

179700 

11.89 

8.66 

13.58 

29.37 

9.57 

600 

80% 

143760 

9.54 

8.67 

11.23 

23.71 

8.42 

600 

60% 

107820 

7.71 

8.32 

8.62 

17.66 

6.37 

600 

40% 

71880 

5.86 

8.34 

6.14 

11.85 

4.37 

TOTAL  TIME  (sec) 

53.61 

52.25 

61.91 

124.97 

43.82 

RANK 

3 

2 

4 

5 

1 

Table  3.  Comparison  of  Sequential  Codes  on  Low  Density  Random  Graphs 
Having  a  Cost  Range  of  1  to  10,000 
(times  are  total  seconds  to  solve  three  problems) 


Graph 

SPARSE 

DENSE 

HEAP 

Vertices 

Density 

Edges 

PRIM 

PRIM 

PRIM 

SOLLIN 

KRUSKAL 

20% 

3980 

0.45 

0.90 

0.54 

0.54 

0.42 

10% 

1990 

0.37 

0.89 

0.41 

0.30 

0.33 

5% 

995 

0.33 

0.90 

0.34 

0.16 

0.25 

400 

20% 

15960 

1.81 

3.66 

1.75 

2.69 

1.32 

400 

10% 

7980 

1.43 

3.64 

1.16 

1.39 

1.00 

400 

5% 

3990 

1.25 

3.64 

0.90 

0.72 

0.81 

600 

20% 

35940 

4.10 

8.32 

3.62 

6.07 

2.47 

600 

10% 

17970 

3.22 

8.26 

2.28 

3.09 

2.06 

600 

5% 

8985 

2.79 

8.27 

1.61 

1.53 

1.43 

TOTAL  TIME  (sec) 

15.75 

38.48 

12.61 

16.49 

10.09 

RANK 

3 

5 

2 

4 

1 

Table  4.  Comparison  of  Sequential  Codes  on  Sparse  Random  Graphs 
Having  a  Cost  Range  of  0  to  10,000 
(times  are  total  seconds  to  solve  three  problems) 


Graph 

SPARSE 

DENSE 

HEAP 

Vertices 

Density 

Edges 

PRIM 

PRIM 

PRIM 

SOLLIN 

KRUSKAL 

2.0% 

1.0% 

0.5% 

2.0% 

1.0% 

0.5% 

2.0% 

1.0% 

0.5% 


6392 

3196 

1548 

9990 

4995 

2498 

14388 

7194 

3597 


TOTAL  TIME  (sec) 


10.02 

9.66 

9.47 


62.47 


15.03 

14.78 
14.30 

23.95 

23.54 

22.79 

35.08 

34.40 

33.45 


217.32 


19.23 


11.38 


22.11 


RANK 


**  * 


Table  5.  Parallel  DENSE  PRIM  on  G  =  [V,E]  with  |V|  =  900  and  |E|  =  404,000 
Having  a  Cost  Range  of  0  to  100,000 
(all  parallel  times  are  the  average  for  five  runs) 


Problem  1 

Problem  2 

cpu’s 

Algorithm 

time 

(sec) 

speedup 

time 

(sec) 

speedup 

1 

SPARSE  PRIM 

8.89 

0.46 

8.89 

0.73 

1 

DENSE  PRIM 

6.39 

0.63 

6.45 

1.00 

1 

HEAP  PRIM 

9.80 

0.41 

9.74 

0.66 

1 

SOT  1  IN 

22.22 

0.18 

26.31 

0.25 

1 

KRUSKAL 

4.05 

1.00 

7.35 

0.88 

1 

PARALLEL 
DENSE  PRIM 

7.77 

0.52 

7.73 

0.83 

2 

PARALLEL 
DENSE  PRIM 

4.15 

0.98 

4.14 

1.56 

3 

PARALLEL 
DENSE  PRIM 

2.95 

1.37 

2.92 

2.21 

4 

PARALLEL 
DENSE  PRIM 

2.37 

1.71 

2.36 

2.73 

5 

PARALLEL 
DENSE  PRIM 

2.01 

2.01 

1.99 

3.24 

6 

PARALLEL 
DENSE  PRIM 

1.79 

2.26 

1.79 

3.60 

7 

PARALLEL 
DENSE  PRIM 

1.64 

2.47 

1.65 

3.91 

8 

PARALLEL 
DENSE  PRIM 

1.55 

2.61 

1.56 

4.13 

9 

PARALLEL 
DENSE  PRIM 

1.48 

2.74 

1.49 

4.33 

10 

PARALLEL 
DENSE  PRIM 

1.45 

2.79 

1.47 

4.39 

Table  6.  Parallel  SOLLIN  on  350  x  350  Grid  Graphs  With  |V|  =  122,500, 
and  |E|  =  244,300  Having  a  Cost  Range  of  0  to  10,000 
(all  times  are  the  average  for  five  runs) 


E 


Problem  1 

Problem  2 

cpu’s 

time 

(sec) 

speedup 

time 

(sec) 

speedup 

It 

34.33 

1.00 

36.99 

1.00 

1* 

35.64 

0.96 

38.47 

0.96 

2 

19.59 

1.75 

19.62 

1.89 

3 

13.68 

2.51 

12.61 

2.93 

4 

9.85 

3.49 

9.83 

3.76 

5 

8.77 

3.91 

8.62 

4.29 

6 

7.43 

4.62 

7.27 

5.09 

7 

7.07 

4.86 

6.90 

5.36 

8 

6.55 

5.24 

6.64 

5.57 

9 

6.82 

5.03 

6.26 

5.91 

10 

5.83 

5.89 

5.43 

6.81 

t  best  sequential  SOLLIN  code 
*  parallel  code  run  with  a  single  processor 
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ABSTRACT 


The  objective  of  this  study  is  to  computationally  investigate  a  version  of  Dijkstra’s  algo¬ 
rithm  for  the  problem  of  finding  the  shortest  path  between  two  nodes  in  a  graph.  The 
classical  Dijkstra  algorithm  builds  a  shortest  path  tree  rooted  at  one  of  the  designated 
nodes.  This  method  is  computationally  compared  with  a  version  that  builds  tw'o  shortest 
path  trees  rooted  at  each  of  the  two  designated  nodes.  Termination  occurs  when  any  node 
appears  in  both  trees.  Computationally  we  found  that  the  classical  Dijkstra  method  built 
trees  containing  approximately  50%  of  the  nodes  in  the  original  graph  while  the  two-tree 
method  terminated  with  only  6%  of  the  nodes  in  the  two  trees.  In  computational  experi¬ 
ments  involving  over  480  test  problems,  the  two-tree  method  produced  a  speedup  of  over 
four. 
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I.  INTRODUCTION 


Since  the  late  fifties  when  the  first  methods  were  developed,  the  shortest  path  problem 
has  become  one  of  the  fundamental  problems  in  the  areas  of  combinatorial  optimization, 
computer  science,  and  operations  research.  Algorithms  and  applications  are  commonly 
found  in  the  important  books  in  these  areas  (see  for  example  [BeGa],  [BeGh],  [H],  [JB], 
[L],  [PS],  [Q],  and  [T] ) .  The  study  of  this  problem  has  been  motivated  by  both  its 
elegant  mathematical  structure  and  its  many  practical  applications.  Our  recent  interest  in 
this  problem  was  occasioned  by  the  need  to  solve  shortest  path  subproblems  in  several 
mathematical  optimization  procedures  we  are  developing  in  an  MIMD  parallel  computing 
environment. 

Consider  the  network  G  =  [V,  E]  with  node  set  V  and  arc  set 
E  C  (V  x  V)\{(i,  i)  :i£  V).  Let  c(i,  j)  denote  the  length  of  edge  (i,  j).  A  path  in  G 
from  s  £  V  to  t  £  V  is  a  sequence  of  distinct  edges 
(vi,v2),  (v2,v3), ....  (vk_i,vk)  such  that  V!  =  s,  vk  =  t,  and  every  edge  (vjt  vj+1)  €  E.  For 
brevity,  we  will  also  denote  the  path  above  simply  as  s  =  vj,  v2 . vk_j,  vk  =  t.  The  path 

(k-i) 

length  is  given  by  c(vj,  vjtl)-  The  (two  node)  shortest  path  problem  is  defined  as 

j  =  i 

follows:  Given  two  distinct  nodes  s  and  t,  find  a  directed  path  in  G  from  s  to  t  with 
minimum  length. 

Excellent  surveys  for  the  many  variations  of  the  shortest  path  problem  may  be  found 
in  Deo  and  Pang  [DP]  and  Gallo  and  Pallottino  [GP].  A  survey  of  techniques  and 
computational  comparisons  may  be  found  in  Dial,  Glover,  Kamey,  and  Klingman 
[DGKKa],  and  [DGKKb],  in  Klingman,  Mote,  and  Whitman  [KMW],  in  Glover,  Glover, 
and  Klingman  [GGK],  in  Desrochers  [De],  and  in  Divoky  [Di] .  In  [DGKKa]  all  the  meth¬ 
ods  are  grouped  into  two  general  classes:  label-setting  algorithms  and  label-correcting 
algorithms.  Dijkstra  is  credited  with  the  first  label-setting  algorithm  and  any  algorithm 
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which  uses  this  approach  has  been  considered  a  particular  implementation  of  Dijkstra’s 
original  algorithm  (see  [GP]). 

The  Dijkstra  algorithm  is  restricted  to  problems  having  non-negative  edge  lengths  and 
it  builds  a  shortest  path  tree  rooted  at  s.  At  each  iteration  at  least  one  new  node  and 
edge  are  appended  to  the  shortest  path  tree.  Hence,  after  at  most  |V|  -  1  iterations  t  is 
appended  to  the  tree  and  the  shortest  path  from  s  to  t  is  known.  It  occurred  to  us  that  in 
an  MIMD  parallel  computing  environment  shortest  path  trees  could  be  grown  from  both  s 
and  t  using  two  independent  processors  and  that  when  the  trees  meet  we  should  have  a 
solution.  A  simplistic  argument  led  us  to  believe  that  on  the  average  we  should  obtain  a 
solution  in  half  the  time  taken  by  a  typical  serial  implementation.  Early  experimental 
results  with  a  parallel  implementation  of  this  strategy  applied  to  both  the  Dijkstra  algo¬ 
rithm  for  the  shortest  path  problem  and  the  closely-related  painted  network  algorithm  for 
the  painted  problem  (see  [R])  were  excellent.  This  led  us  to  conjecture  that  a  serial 
algorithm  which  mimics  the  action  of  the  two  parallel  processors  should  be  superior  to  the 
usual  serial  implementation.  We  subsequently  developed  such  a  serial  implementation, 
and  our  conjecture  was  born  out.  A  search  of  the  literature  revealed  that  a  serial  algo¬ 
rithm  of  this  type,  which  we  call  a  two-tree  Dijkstra  algorithm,  had  been  anticipated. 

In  1960  Dantzig  [Da]  suggested  that  a  pair  of  trees  be  built  with  one  rooted  at  s  and 
the  other  rooted  at  t.  No  stopping  criteria  were  given.  This  strategy  also  appears  in  the 
1962  book  by  Berge  and  Ghouila-Houri  [BeGh]  with  an  incorrect  stopping  criterion. 
Nicholson  [N]  was  the  first  to  present  a  correct  analysis  of  the  Dijkstra  two-tree  algo¬ 
rithm.  Additional  discussion  may  be  found  in  the  1969  survey  by  Dreyfus  [Dr],  Unfor- 
runately,  the  literature  contains  no  computational  studies  and  the  excellent  speedup  possi¬ 
ble  with  the  two-tree  Dijkstra  algorithm  has  apparently  not  been  previously  observed. 

The  explanation  for  the  excellent  behavior  of  the  two-tree  Dijkstra  algorithm  lies  in 
the  small  size  of  each  rooted  tree  when  the  stopping  criterion  is  satisfied.  In  our  compu- 
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tational  study,  we  found  that  the  original  Dijkstra  algorithm  generated  a  shortest  path  tree 
containing  approximately  50%  of  the  original  nodes.  However,  both  trees  together  in  the 
two-tree  method  contain  only  6%  of  the  original  nodes.  This  is  an  astonishing  observa¬ 
tion  which  can  be  used  to  improve  numerous  algorithms  which  depend  upon  the  repeated 
solution  of  shortest  path  problems.  It  is  the  purpose  of  this  paper  to  communicate  our 
experimental  results  which  indicate  that  the  two-way  Dijkstra  should  be  used  in  prefer¬ 
ence  to  the  ordinary  Dijkstra  algorithm,  especially  within  algorithms  which  depend  upon 
the  repeated  solution  of  shortest  path  problems.  In  addition,  we  present  the  algorithms  in 
detail  and  a  convergence  proof  with  an  appropriate  stopping  criterion.  Our  experience 
points  up  the  lesson  that  examination  of  algorithms  for  the  parallel  computing  environ¬ 
ment  may  well  lead  to  better  design  of  algorithms  for  the  serial  environment. 
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II.  THE  CLASSICAL  ALGORITHM 


Dijkstra’s  classical  algorithm  begins  at  node  s  and  builds  a  shortest  path  tree  in  which 
the  shortest  path  from  s  to  any  node  in  the  tree  is  known.  When  node  t  is  placed  in  the 
tree  we  have  a  minimum  length  directed  path  from  s  to  t.  The  algorithm  may  be  stated  as 
follows: 

DIJKSTRA’S  SHORTEST  PATH  ALGORITHM 
Input: 

1.  A  graph  G  =  [V,  E]  with  node  set  V  and  edge  set  E. 

2.  A  length  c(i,  j),  for  each  edge  (i,  j)e  E  . 

3.  Two  nodes  s  and  t,  between  which  a  shortest  path  is  desired. 

Working  Entities: 

1.  A  set  Q  of  nodes  which  are  to  be  scanned. 

2.  A  set  R  of  nodes  whose  labels  are  not  permanent. 

3.  A  set  of  labels  {d(j)}  for  node  distance  from  s. 

4.  A  set  of  labels  {pG)}  for  node  predecessor  in  tree. 

5.  A  minimum  distance  value  u  for  nodes  whose  labels  are  not 
permanent. 

Output: 

1.  A  shortest  path  in  G  from  s  to  t  implicit  in  labels  {pQ)}.  Explicitly,  the  path  is 

s  =  pw(t),  Pw-i(t), ....  p2(t),  p(t),  t, 

where  pi(t)  is  defined  recursively  by  pi(t)  =  p(t)  and  p;(t)  =  p(pi-i(t))  for  i  2:  2. 

2.  The  length  u  of  a  shortest  path  in  G  from  s  to  t. 

Assumptions: 

All  c(i,  j)  >  0,  s  *  t,  and  there  exists  a  directed  path  in  G  from  s  to  t. 
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procedure  DIJKSTRA1: 


begin 

1  R  *-  V,  for  all  k  G  R,  d(k) «-  oo;  d(s)  *-  0,  p(s)  *-  s; 

2  u  *—  min {d(k)  :  k  E  R}, Q  *-  {k  E  R  :  d(k)  =  u},  select  i  G  Q,R<-  R\{i}; 

3  for  all  (i,j)  E  ({i}  x  R)nE, 

4  if  d(j)  >  u  +  c(i,  j),  then 

5  d(j)  *-  u  +  c(i,  j),  pO)  *-  i; 

6  end  if 

7  end  for 

8  if  i  *  t,  then  go  to  2; 

end. 

We  define  D(j)  to  be  the  length  of  a  shortest  path  in  G  from  node  s  to  node  j.  We  will 
say  that  an  iteration  has  occurred  each  time  that  Steps  2  through  7  have  been  completed 
and  at  the  end  of  an  iteration  the  node  i  will  be  said  to  be  permanently  labeled  and 
scanned.  Proof  for  the  following  two  propositions  may  be  found  in  Even  [E] . 

Proposition  1.  During  the  execution  of  DIJKSTRA1,  if  d(j)<°°  ,  then  there  is  a  path  in  G 
from  s  to  j  of  length  dQ). 

Proposition  2.  If  j  E  Q  in  Step  2  of  DUKSTRA1,  then  d(j)  =  D(j). 

The  following  four  results  are  readily  obtained  from  the  previous  propositions. 
Proposition  3.  During  the  execution  of  DUKSTRA1,  d(j)  £  D(j). 

Proposition  4.  During  the  execution  of  DLJKSTRA1,  if  j  £  R,  d(j)  =  DO)- 

Proposition  5.  During  the  execution  of  DIJKSTRA1,  if  j  E  R,  dQ)^  D(q)  for  all  q  ?  R. 

Proposition  6.  During  the  execution  of  DIJKSTRA1,  if  j  £  R  and  j  *  s,  then  p0)£  R- 
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Proposition  7.  If  (q,  r)E  E,  D(q)  <  D(r)  +  c(q,  r). 

Proof.  Let  s  =  vlt  v2 . vk  =  r  is  any  shortest  path  in  G  from  s  to  r. 

Case  1.  Assume  (q,  r)  *  (vk-i,vk).  Then  edge  (q,r)  extends  the  path  to  a  path  in  G 

from  s  to  q  of  length  D(r)  +  c(q,r).  Hence  D(q)  ^  D(r)  +c(q,r). 

Case  2.  Assume  (q,  r)  =  (vk_j,vk).  Then  D(r)=D(q)+c(q,r). 

Hence,  D(q)=D(r)-c(q,r)  <  D(r)+c(q,r),  since  c(q,r)£  0.  □ 

Proposition  8.  After  the  first  execution  of  Step  2  in  DUKSTRA1,  if  i  E  R,  there  exists  a 
shortest  path  s  =  Vj,  v2, ....  vk  =  i  in  G  from  s  to  i  for  which  an  integer  j  satisfying 

1  <  j  <  k  exists  such  that  {vq . Vj_i}  n  R  =  0and  {vj vk}  C  R. 

Proof.  After  Step  1,  R=V,  d(s)=0,  and  d(r)  =  <*  for  all  r  E  V  w'here  r  *  s.  Thus  the 
first  time  through  Step  2  produces  u=0,  Q={s},  and  R=V\{s},  so  that  s  £  R  from  then  on. 
Let  s  =  yi,y2t  .  .  .  ,  yq  =  i  be  any  shortest  path  in  G  from  s  to  i.  Let  r  be  the  smallest 

integer  such  that  yr  E  R.  Then  {y,+i ,  .  .  .  ,yq}  C  R.  If  r=  1,  yi,y2 . yq  is  the 

path  we  seek  and  )~2.  Otherwise,  reversing  the  path  given  by 
yr,  p(yr),  •••.Pw-i(yr),pw(yr)  =  s 

and  adjoining  it  to  yr+j,  .  .  .  ,  yq  produces  the  path  we  seek,  with  j  =  r  +  1.  By  Prop.  6, 
all  nodes  in  the  reversed  path  are  not  in  R.  □ 

That  an  arbitrary  shortest  path  in  G  from  s  to  i  may  not  exhibit  the  property  of  Prop.  8  is 
shown  in  Figure  1. 
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II.  DIJKSTRA’S  TWO-TREE  ALGORITHM 


The  two-tree  algorithm  builds  a  pair  of  shortest  path  trees  which  we  call  the  left  tree 
and  the  right  tree.  The  left  tree  contains  s  while  the  right  tree  contains  t.  The  two  trees 
are  grown  in  alternate  steps  and  termination  occurs  when  a  node  appears  in  both  trees. 
For  the  problem  illustrated  in  Figure  2,  termination  occurred  at  iteration  4  when  node  3 
appeared  in  both  trees.  However,  one  should  note  that  node  3  is  not  contained  in  the 
shortest  path.  To  determine  the  shortest  path,  we  add  the  left  and  right  labels  and  select 
a  node  with  smallest  sum.  For  this  example,  the  sum  of  the  labels  are  as  follows:  (node 
1,  00 ),  (node  2,  5),  (node  3,  6),  (node  4,  5),  (node  5,°°).  Therefore,  the  shortest  path 
has  length  5  and  is  given  by  (1,  2),  (2,  4),  (4,  5).  The  two-tree  Dijkstra  algorithm  may  be 
stated  as  follows: 


DIJKSTRA’S  TVYO-TREE  SHORTEST  PATH  ALGORITHM 

Input: 

1.  A  graph  G  =  [V,  E]  with  node  set  V  and  edge  set  E. 

2.  A  length  c(i,  j)  for  each  edge  (i,  j)E  E  . 

3.  Two  nodes  s  and  t,  between  which  a  shortest  path  is  desired. 

Working  Entities: 

1.  Sets  Qs  and  Q'  of  nodes  which  are  to  be  scanned  with  respect 
to  the  trees  rooted  at  nodes  s  and  t,  respectively. 

2.  Sets  Rsand  R1  of  nodes  whose  labels  are  not  permanent  with  respect 
to  the  trees  rooted  at  nodes  s  and  t,  respectively. 

3.  Sets  of  labels  {ds(j)}  and  (d'(j)}  for  node  distances  from  nodes  s 
and  t,  respectively. 
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4.  Sets  of  labels  {ps0)l  and  (p’G)}  for  node  predecessors  in  trees 
rooted  at  nodes  s  and  t,  respectively. 

5.  Minimum  distance  values  us  and  u'  for  nodes  whose  labels  are  not  permanent 
in  the  trees  rooted  at  nodes  s  and  t,  respectively. 

Output: 

1.  The  length  u  of  a  shortest  path  in  G  from  s  to  t. 

2.  A  set  of  J  nodes  each  lying  on  a  shortest  path  in  G  from  s  to  t. 

3.  A  shortest  path  in  G  from  s  to  t  implicit  in  J  and  the  predecesor  labels  {pj}. 
Explicitly,  the  path  is 

s  =  pu(r),psu-i(r),...,ps(r),r,  p'(r),  .  .  .  ,  p‘_,(r),  pl(r)  =  t, 
where  pf(-)  and  pj(-)  are  defined  analogously  to  Pi(-)  of  Section  I. 

Assumptions: 

All  c(i,  j)  >  0,  s  ^  t,  and  there  exists  a  directed  path  in  G  from  s  to  t. 

procedure  DIJKSTRA2: 

begin 

Is  Rs  —  V,  for  all  k  E  Rs,  ds(k)  —  oo;  ds(s)  —  0,  ps(s)  —  s; 

1'  Rl  *-  V,  for  all  k  E  R\  d’(k)  « ;  d*(t)  *-  0,  p*(t)  —  t; 

2s  us  —  min  {ds(k)  :  k  E  R5},  Qs  —  {k  E  Rs :  ds(k)  =  us},  select  i  E  Qs,  Rs  «-  Rs\{i>; 

3s  for  all  (i,  j)  E  ({i}  x  Rs)  D  E, 

4s  if  ds(j)  >  us  +  c(i,  j),  then 

5s  dsG)  *-  us  +  c(i,  j),  ps0)  «-  i; 

6s  end  if 

7s  end  for 

8s  if  i  E  R’,  then  go  to  9; 

2*  u1  *-  min  (d’(k)  :  k  E  R'},  Q' «-  {k  E  R' :  d’(k)  =  u1},  select  j  E  Q\  R'  *-  R'\G); 
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3‘  for  all  (i,j)  E  (R'  x  {j})nE, 

41  if  d'(i)  >  u1  +  c(i,  j),  then 

5'  d'(i)  *-  u'  +  c(i,j),  p'(i)  *-  j; 

6‘  end  if 

7'  end  for 

8‘  if  j  £  Rs,  then  go  to  2s; 

9  u  —  min  {ds(j)  +  d'(j)  :  j  E  (V\RS)  (J  (V\R')} 

10  J  —  {j  E  (V\RS)  (J  (V\R‘)  :  dsQ  +  d‘(j)  =  u}; 

end 

Note  that  with  minor  notational  adjustments,  Propositions  1  through  8  also  apply  to  the 
tree-building  steps  Is  through  8s  for  the  tree  rooted  at  s  as  well  as  1’  through  8'  for  the 
tree  rooted  at  t,  with  the  roles  of  s  and  t  reversed. 

Proposition  9.  (Nicholson  [N])  When  DIJKSTRA2  terminates,  the  length  of  the  shortest 
path  in  G  from  s  to  t  is  given  by 

u  =  min  {ds(j)  +  dl(j)  :  j  E  (V\RS)  U  (V\R1)}  •  (1 ) 

Proof.  Let  n  E  (V\RS)  ("1  (V\R’)  and  let  r  be  any  node  of  (V\RS)  (J  (VXR1)  such  that 
ds(r)  +  d'(r)  =  u.  Let  i  be  an  arbitrary  node  of  V.  We  will  show  that 

u  <  Ds(i)  +  D'(i)  (2) 

Case  1.  Assume  i  E  (V\RS)  and  i  E  (V\R’).  By  Prop.  4,  Ds(i)  =  ds(i)  and  D’(i)  =  d’(i), 
so  that  Ds(i)  +  D'(i)  =  ds(i)  +  d‘(i)-  Also,  i  E  (V\RS)  U  (V\R'),  so  that  u  <  ds(i)  +  d*(0- 
Thus  (2)  holds. 

Case  2.  Assume  i  ^  (V\RS)  and  i  ^  (V\R‘).  Then  i  E  Rs  and  i  E  R'.  Since 
n  £  Rs  and  n  $  R\  by  Prop.  5,  Ds(i)  £  ds(n)  and  D’(i)  s  d’(n),  so  that 
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Ds(i)  +  D’(i)  >  ds(n)  +  d’(n).  Since  n  E  (V\RS)  U  (V\R')t  ds(n)  +  d'(n)  >  u.  Thus  (2) 
holds. 

Case  3.  Assume  i  £  (V\RS)  and  i  E  (VXR*).  By  Prop.  8  there  exists  a  shortest  path 
s  =  Vj,  v2,  .  .  .  ,  vk  =  i  in  G  from  s  to  i  for  which  an  integer  w  satisfying  1  <  w  <  k  exists 

such  that  {vi,  .  .  .  ,vw_1}nRs  =  0  and  {vw . VjJ  C  Rs.  For  any  j  such  that 

1  <  j  <  k  ,  we  must  have  that 

Ds(vj+i)  -  Ds(vj)  =  c(vj,  vj+1).  (3) 

Also,  for  any  j  such  that  1  <  j  <  k,  we  have  from  Prop.  6  that 

D’(vjtl)  -  D'(vj)  >  -  c(vj,  vj+  j ) .  (4) 

Let  p  be  any  integer  such  that  1  <  p  <  k  - 1  Summing  both  (3)  and  (4)  with  j  taking  on 
all  integral  values  from  p  to  w-1,  we  obtain 

0-1) 

Ds(vk)  -  Ds(vp)  =  £  c(vj,vj+1)  (5) 

j  =  p 

and 

(k-l) 

D'(vk)  -  D'(Vp)  S:  X  c(Vj,vjtl).  (6) 

j  =  p 

Adding  (5)  and  (6)  and  using  vk  =  i,  we  obtain  for  all  p  such  that  1  <  p  <  k  -  1, 

Ds(i)  +  D'(i)  >  Ds(vp)  +  D'(Vp).  (7) 

Subcase  3.1  Assume  there  is  an  integer  h  such  that  1  <  h  s  w  -  1  and  vh  £  R’.  We 
also  have  that  vh  £  Rs,  so  that  by  Prop.  4,  d'(vh)  =  D’(vh)  and  ds(vh)  *  Ds(vh).  By  (7), 
Ds(i)  +  D'(i)  >  Ds(vh)  +  D’(vh)  =  ds(vh)  +  dVh).  Since  vh  E  (V\RS)  U  (V\R'), 
ds(Vh)  +  d'(Vh)  ^  u.  Thus  (2)  holds. 

Subcase  3.2  Assume  there  is  an  integer  1  such  that  w  ^  1  <  k  and  v,  E  R'.  Since 
1  £  w,  Vj  E  Rs.  By  Case  2,  us  Ds(vj)  +  D’(vi).  From  (7), 

Ds(i)  +  D’(i)  >  Ds(vi)  +  D’(vj),  so  that  (2)  holds. 
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Subcase  3.3  Assume  that  {vt . vwM}  c  R'  and  {vw,  .  .  .  ,  vk}nRI  =  0.  Since 

vw-i  £  Rs  and  (vw_i,  vw)  £  (Qs  x  R!)  HE  on  the  iteration  vw_jwas  scanned, 

ds(vw)  <  Ds(vw_!)  +  c(vw_!,  vw).  (8) 


From  (3), 

Ds(vw)  =  Ds(vw.,)  +  c(vw_j,  vw).  (9) 

From  (8)  and  (9),  ds(vw)  <  Ds(vw)  and  from  Prop.  3,  ds(vw)  ^  Ds(vw)  ,  so  that 

Ds(vw)  =  ds(vw).  (10) 


Since  vw  £  R',  from  Prop.  4, 
D'(vw)  =  d'(v-w). 


(ID 


Adding  (10)  and  (11),  we  have  that 

Ds(vw)  +  D'(vw)  =  d*(vw)  +  d'(vw)- 
Since  vw  £  V\R\  vw  £  (V\RS)  (J  (V\R'),  so  that 
u  <  ds(vw)  +  d’(vw). 


From  (7), 

Ds(i)  +  D’(i)  s  Ds(vw)  +  D’(vw). 
From  (12),  (13),  and  (14),  we  have  that  (2)  holds. 


(12) 

(13) 

(14) 


Case  4.  Assume  i  &  (V\RS)  and  i  £  (VXR*).  An  argument  similar  to  that  of  Case  3 
shows  that  (2)  holds,  a 
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IV.  COMPUTATIONAL  EXPERIENCE 

Both  algorithms  have  been  coded  and  run  on  dense  mxn  bipartite  graphs  having  2mn 
edges.  The  data  structure  used  is  identical  to  that  used  by  Jonker  and  Volgenant  [JV]  in 
their  highly  successful  assignment  code.  Both  codes  are  written  in  FORTRAN  and  were 
run  on  a  Sequent  Symmetry  S81  using  a  single  Intel  80386  cpu.  The  computational 
experience  is  presented  in  Table  1. 

Each  data  point  in  Table  1  is  the  sum  for  twenty  problems,  i.e.  240  problems  were 
solved  with  three  different  codes.  For  each  set  of  20  problems,  the  cost  structure  was  the 
same,  and  s  was  selected  at  random  from  among  the  left  side  nodes  and  t  was  selected  at 
random  from  among  the  right  side  nodes. 

The  Dijkstra  Left  Side  Only  builds  the  left  tree  while  the  Dijkstra  Right  Side  Only 
builds  the  right  tree.  The  column  entitled  iter  gives  the  number  of  nodes  placed  in  the 
shortest  path  trees  for  all  twenty  problems.  For  the  1000  x  1000  problems  with  2,000,000 
edges,  the  Dijkstra  Left  Side  Only  algorithm  builds  a  shortest  path  tree  with  1057  nodes 
on  the  average.  For  this  set  of  eighty  problems,  the  Dijkstra  two-tree  algorithm  builds 
shortest  path  trees  with  a  total  of  110  nodes  on  the  average.  This  is  an  extraordinary 
reduction  in  the  amount  of  computational  work  required  to  solve  these  eighty  problems. 
However,  there  is  overhead  involved  in  the  two-tree  algorithm  which  accounts  for  a 
speedup  of  four  while  the  reduction  in  the  number  of  nodes  placed  in  the  shortest  path 
trees  is  a  factor  of  eight.  The  Dijkstra  two-tree  algorithm  also  takes  almost  double  the 
storage  of  the  classical  Dijkstra  method.  The  comparison  of  storage  is  given  in  Table  2. 

Similar  codes  were  developed  and  run  on  twelve  of  the  (nonbipartite)  NETGEN  [KNS] 
problems.  The  computational  experience  is  presented  in  Table  3.  Each  data  point  is  the 
sum  for  twenty  problems.  From  each  NETGEN  problem,  twenty  problems  were  gener¬ 
ated  by  randomly  selecting  s  to  be  a  source  and  t  to  be  a  sink.  Given  randomly  selected  s 
and  t  nodes,  the  NETGEN  problems  may  not  contain  a  directed  path  from  s  to  t.  There- 
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fore,  step  3  for  the  algorithms  has  been  modified  to  terminate  with  no  feasible  solution 
when  u,  us,  or  u’  equal  <*.As  with  the  bipartite  structure  only  about  10%  as  many 
nodes  need  be  scanned  with  the  2-tree  algorithm  as  with  the  single  tree  algorithm. 
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V.  SUMMARY  AND  CONCLUSIONS 


Dijkstra’s  classical  algorithm  has  proven  to  be  extremely  successful  for  the  problem  of 
finding  the  shortest  path  in  a  graph  connecting  two  given  nodes.  A  minor  modification  of 
this  algorithm  suggested  by  Nicholson  [N]  converts  a  good  algorithm  into  an  extraodinary 
method.  While  the  classical  algorithm  required  a  scan  over  approximately  50%  of  the 
problem  nodes,  the  two-tree  algorithm  only  required  a  scan  over  6%  of  the  problem 
nodes.  On  80  problems  containing  2000  nodes  and  2,000,000  edges,  the  new  algorithm 
obtains  the  shortest  path  in  approximately  1.7  seconds/problem  compared  to  7.5  seconds 
for  the  best  classical  algorithm. 
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Figure  1.  Shortest  Path  Without  Prop.  8  Property 


Figure  2.  Example  of  Two-Tree  Dijkstra 
(Shortest  Path  From  1  to  5  has  Length  5) 


Table  1.  Comparison  of  Dijkstra  Algorithms 
on  Dense  m  x  n  Bipartite  Graphs 
(all  times  are  in  seconds  for  20  problems) 
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Table  2. 

Comparison  of  Storage  for  Classical 
and  Two-Tree  Dijkstra  Codes 
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Table  3.  Comparison  of  Dijkstra  Algorithms  on  Random  Networks 
(all  times  are  in  seconds  for  20  problems) 
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ABSTRACT 


The  best  algorithms  for  the  dense  assignment  problem  are  acknowledged  to  be  the  auction 
algorithm  and  the  shortest  augmenting  path  algorithm.  In  this  investigation  we  present  an 
empirical  analysis  of  the  best  software  implementations  of  these  two  methods  on  three 
different  serial  machines.  These  software  implementations  were  developed  by  Professor 
Bertsekas  of  Massachusetts  Institute  of  Technology  and  by  Professors  Jonker  and  Vol- 
genant  of  the  University  of  Amsterdam.  This  was  an  independent  evaluation  of  the  soft¬ 
ware  implementation  of  these  two  algorithms.  For  the  sample  of  problems  examined  and 
the  sample  of  hardware  used  (IBM  3081D,  Sequent  Symmetry  S81,  and  VAX  750),  we 
found  that  the  shortest  augmenting  path  algorithm  was  the  best.  We  also  report  our 
empirical  results  with  a  parallel  version  of  the  shortest  augmenting  path  algorithm.  On 
1200x1200  dense  assignment  problems,  speedups  of  approximately  four  were  achieved 
using  ten  processors.  Million  arc  problems  were  routinely  solved  in  less  than  fifteen 
seconds  on  a  Sequent  Symmetry  S81  with  the  parallel  shortest  augmenting  path  algo¬ 
rithm. 
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I.  INTRODUCTION 


The  classical  assignment  problem  (also  known  as  the  weighted  bipartite  matching 
problem)  is  to  assign  n  men  to  n  distinct  jobs  so  that  the  total  cost  of  assignment  is 
minimized.  Mathematically,  this  may  be  formulated  as  the  following  special  mathemati¬ 
cal  program: 

minimize  X  Cy 

i.  j 

subject  to:  X  x,j  =  1,  (alt  j) 

i 

X  Xjj  =  1,  (all  i) 

j 

Xjj  G  {0,1},  (all  i,  j) 

where  Cy  denotes  the  cost  for  assigning  man  i  to  job  j  and  Xy  =  1  implies  that  man  i  is 
assigned  to  job  j.  Due  to  the  total  unimodularity  of  the  constraint  matrix,  this  problem 
can  be  solved  by  the  simplex  algorithm  and  every  basic  solution  will  have  xSj  G  (0, 1}  (see 
Kennington  and  Helgason  [1980]).  Hence,  classic  linear  programming  duality  theory  and 
Kuhn-Tucker  optimality  conditions  can  be  used  in  algorithm  development  for  this  prob¬ 
lem. 

Specialized  algorithms  for  the  assignment  problem  can  be  classified  into  five  catego¬ 
ries  as  follows: 

(i)  maximum  flow  (primal-dual), 

(ii)  primal  simplex, 

(iii)  dual  simplex, 

(iv)  auction  algorithm,  and 

(v)  shortest  augmenting  paths. 

The  first  maximum  flow  algorithm  was  developed  by  Kuhn  [1955]  and  is  called  the  Hun¬ 
garian  method.  Its  name  comes  from  the  fact  that  the  algorithm  is  developed  from  results 
of  two  Hungarian  mathematicians.  Variations  were  presented  by  Kuhn  [1956].  Derigs 
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[1985]  shows  that  the  shortest  augmenting  path  method  can  be  viewed  as  a  more  eco¬ 
nomical  implementation  of  the  Hungarian  method.  Ahuja,  Magnanti,  and  Orlin  [1988] 
call  the  Hungarian  method  a  primal-dual  variant  of  the  successive  augmenting  path  algo¬ 
rithm. 

A  specialized  primal  simplex  algorithm  for  the  assignment  problem  was  developed  by 
Barr,  Glover  and  Klingman  [1977],  Their  method,  called  the  alternating  basis  algorithm, 
only  considers  a  subset  of  the  possible  bases.  This  idea  was  further  exploited  by  Hung 
[1983]  in  his  development  of  a  polynomial  simplex  algorithm  for  this  problem.  An  exten¬ 
sive  computational  study  comparing  the  Hungarian  algorithm  with  primal  simplex  meth¬ 
ods  was  performed  by  McGinnis  [1983]. 

A  dual  polynomial  simplex  method  known  as  the  signature  method  has  been  devel¬ 
oped  by  Balinski  [1985,  1986].  Extensions  for  the  sparse  assignment  problem  were  devel¬ 
oped  by  Goldfarb  [1985],  and  the  relationship  between  the  signature  method  and  the 
shortest  augmenting  path  method  was  presented  by  Derigs  [1985].  Another  variation  of 
this  algorithm  has  been  developed  by  Akgul  [1988]. 

Bertsekas  [1979,  1981,  1987]  and  Bertsekas  and  Eckstein  [1988]  give  a  complete  theo¬ 
retical  development  of  the  auction  algorithm.  This  algorithm  critically  depends  on  the 
ideas  of  e  -complementary  slackness  and  adaptive  scaling.  The  latest  version  of 
Bertsekas’  auction  code  was  completed  in  June  1988  and  has  been  placed  in  the  public 
domain.  Barr  and  Christiansen  [1989]  have  experimented  with  a  parallel  version  of  this 
algorithm  written  in  C++  on  the  Sequent  Symmetry  S81,  and  Phillips  and  Zenios  [1988] 
have  experimented  with  this  algorithm  on  the  Connection  Machine.  Perry  [1988]  experi¬ 
mented  with  a  parallel  version  of  the  auction  algorithm  on  both  the  Alliant  FX-8  and  the 
Sequent  Symmetry  S81. 


-  55  - 


Hung  and  Rom  [1980]  presented  a  shortest  augmenting  path  method  which  had  both  a 
polynomial  bound  and  good  computational  results.  Other  variants  of  the  shortest  aug¬ 
menting  path  method  have  been  presented  by  Glover,  Glover  and  Klingman  [1986]  and 
Jonker  and  Volgenant  [1987].  An  extensive  computational  study  with  the  shortest  aug¬ 
menting  path  method  may  be  found  in  Derigs  [1985]. 

Recently,  scaling  based  algorithms  have  been  presented  for  the  assignment  problem 
(see  Gabow  [1985]  and  Orlin  and  Ahuja  [1988]).  These  algorithms  are  derivatives  of  the 
Hungarian  method  and  the  auction  algorithm,  respectively,  with  the  added  feature  of  data 
scaling.  Bertsekas  is  using  a  similar  idea  in  his  latest  auction  code. 

There  are  three  ways  to  analyze  the  performance  of  an  algorithm:  worst-case  analy¬ 
sis,  average  case  analysis,  and  empirical  analysis.  The  worst-case  analysis  results  for  the 
assignment  problem  are  presented  in  the  excellent  report  by  Ahuja,  Magnanti,  and  Orlin 
[1988].  The  objective  of  our  study  is  to  present  an  empirical  analysis  of  the  two  top 
performing  serial  codes.  These  codes  were  obtained  directly  from  the  authors,  and  they 
represent  the  current  best  software  implementation  of  the  top  competing  algorithms. 
They  were  run  on  three  different  machines  (IBM  3081D,  Sequent  Symmetry  S81,  and 
VAX  750)  to  allow  for  an  analysis  with  respect  to  differences  in  machine  architecture  and 
compiler.  The  best  code  w'as  then  parallelized  and  the  speedup  achieved  on  a  shared 
memory  multiprocessor  was  reported. 


II.  SEQUENTIAL  CODES 

Five  algorithms  for  solving  dense  assignment  problems  have  been  implemented  and 
computationally  compared  by  various  researchers.  We  are  not  aware  of  any  computa¬ 
tional  studies  involving  the  dual  algorithms.  Derigs  [1985]  shows  that  a  shortest  augment¬ 
ing  path  implementation  is  superior  to  a  Hungarian  implementation.  This  has  been  con- 
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firmed  by  Jonker  and  Volgenant  (1987]  and  by  the  authors  in  a  comparison  with  the 
codes  of  Jonker  and  Volgenant  [1987]  and  Rardin  [1986].  Glover,  Glover  and  Klingman 
[1977]  found  that  their  shortest  augmenting  path  code  was  superior  to  the  specialized 
simplex  code  of  Barr,  Glover  and  Klingman  [1977].  The  authors  have  confirmed  this  with 
a  comparison  of  the  Jonker  and  Volgenant  [1987]  and  Barr,  Glover  and  Klingman  [1977] 
codes.  Jonker  and  Volgenant  [1987]  also  concluded  that  their  dense  shortest  augmenting 
path  code  was  superior  to  the  auction  code  of  Bertsekas  [1981]. 

After  many  studies  over  a  fifteen  year  period,  it  is  acknowledged  that  the  two  best 
algorithms  for  dense  assignment  problems  are  the  auction  algorithm  and  the  shortest 
augmenting  path  algorithm.  We  believe  that  the  best  software  implementation  of  the 
auction  algorithm  is  the  code  of  Professor  Bertsekas  (Version  1.0,  June  1988).  Most  of 
the  other  auction  codes  that  we  have  seen  are  very  sensitive  to  the  cost  structure  and 
degrade  as  the  cost  range  becomes  larger.  These  other  codes  sometimes  work  very  well 
for  small  cost  ranges  and  fail  miserably  for  cost  ranges  as  small  as  [0,1000].  By  the  use 
of  adaptive  scaling,  Professor  Bertseklas’  code  works  well  for  both  a  small  cost  range  and 
a  large  cost  range.  This  code  scales  all  cost  data  by  n+1  and  solves  a  sequence  of  prob¬ 
lems  with  decreasing  values  of  the  stopping  criterion.  All  calculations  are  performed  in 
integer  arithmetic.  We  believe  that  the  best  implementation  of  the  shortest  augmenting 
path  algorithm  was  developed  by  Jonker  and  Volgenant  [1987],  Dijkstra’s  algorithm  is 
used  to  obtain  the  shortest  augmenting  paths  a  ic.  only  integer  arithmetic  is  required.  The 
code  also  incorporates  an  elaborate  pre-orocessing  stage  which  greatly  reduces  the  total 
number  of  times  that  the  Dijkstra  algorithm  is  required.  It  also  uses  a  clever  data  struc¬ 
ture  for  updating  the  dual  variables  after  a  shortest  augmenting  path  has  been  found.  The 
code  maintains  dual  variables  and  the  reduced  costs  are  calculated  as  required.  Both 
codes  are  written  in  standard  FORTRAN. 
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The  empirical  results  of  our  experiment  are  presented  in  Table  1.  For  all  test  prob¬ 
lems,  all  cost  ranges,  and  all  machines,  the  shortest  augmenting  path  code  dominated  the 
auction  code.  The  auction  code  had  the  greatest  difficulty  when  the  cost  range  was  the 
smallest,  i.e.  [0,100].  When  the  cost  range  was  at  least  [0,1000],  the  auction  algorithm 
was  affected  very  little  by  the  cost  range.  The  shortest  augmenting  path  code  was  ad¬ 
versely  affected  by  an  increasing  cost  range.  The  machine  type  definitely  affected  the 
empirical  analysis.  On  the  Sequent,  the  shortest  augmenting  path  code  was  4.41  times 
faster  than  the  auction  code,  on  the  IBM  it  was  3.87  times  faster,  while  on  the  VAX  it  was 
2.79  times  faster.  This  confirms  our  belief  that  the  comparative  performance  of  two 
codes  is  intimately  linked  to  the  hardware,  operating  system,  and  compilers  used.  For  our 
tests  the  IBM  was  running  CMS  and  both  the  Sequent  and  the  VAX  were  running 
UNIXTMt.  Our  results  contradict  the  widely  held  belief  that  the  auction  algorithm  con¬ 
verges  faster  for  lower  cost  ranges.  Professor  Bertsekas’  latest  version  of  the  auction  code 
for  dense  problems  was  not  sensitive  to  changing  the  cost  range  from  [0,1000]  to  [0,100 
000].  In  fact  the  shortest  augmenting  path  code  is  much  more  sensitive  to  the  larger  cost 
ranges  than  the  auction  code  is.  We  also  observed  the  well-known  phenomena  of  the 
auction  algorithm  that  a  significant  amount  of  the  computational  time  is  spent  attempting 
to  complete  the  last  few  assignments.  This  is  in  contrast  to  the  shortest  augmenting  path 
algorithm  that  achieves  one  more  assignment  with  each  application  of  Dijkstra’s  algo¬ 
rithm.  Each  application  of  the  shortest  path  algorithm  can  be  very  expensive,  but  it  is 
guaranteed  to  result  in  one  more  assignment.  This  feature  along  with  the  extensive  pre¬ 
processing  to  obtain  a  good  set  of  partial  assignments  makes  this  approach  work  ex¬ 
tremely  well. 


t  UNIX  is  a  trade  mark  of  AT&T  Bell  Laboratories. 


III.  A  PARALLEL  SHORTEST  AUGMENTING  PATH  CODE 
The  algorithm  of  Jonker  and  Volgenant  [1987]  may  be  divided  into  three  procedures 
as  follows: 

(i)  column  reduction, 

(ii)  augmenting  row  reduction,  and 

(iii)  augmentation  using  a  shortest  path  procedure. 

The  first  two  procedures  require  the  fundamental  operation  of  obtain¬ 
ing  min{d[i]  :  i  G  K}  for  a  given  vector  d[  -  ]  and  a  given  index  set  K.  This  operation  is 
required  twice  during  the  column  reduction  and  once  during  the  row  reduction.  These 
three  “minimum  of  a  vector”  operations  have  been  parallelized  by  using  prescheduled 
data  partitioning.  For  a  p  processor  run,  K  is  partitioned  into  p  subsets  Kj,  ...,KP  having 

K  =  U  Kj  and  Kj  H  Kk  =  <£  for  all  j,  k.  Processor  j  calculates  min{d[i]  :  i  E  Kj), 
j  =  i . p 

and  the  global  minimum  is  set  to  the  smallest  of  the  local  minima.  In  the  shortest  path 
procedure  using  Dijkstra’s  algorithm,  scanning  a  node  requires  comparing  the  current 
distance  label  with  the  distance  label  at  the  node  being  scanned  plus  the  length  of  a  given 
arc.  The  distance  label  is  either  updated  with  the  new  shortest  path  or  remains  un¬ 
changed.  All  of  this  work  is  independent  and  can  also  be  distributed  among  p  processors. 

The  Jonker-Volgenant  algorithm  is  presented  below: 

THE  SHORTEST  AUGMENTING  PATH  ALGORITHM 

Input: 

1.  The  problem  size,  n. 

2.  The  nxn  cost  matrix,  c(i,  j). 

Output: 

1.  x[i]  =  j  implies  that  man  i  is  assigned  to  job  j. 

2.  y[j]  =  i  implies  that  job  j  is  assigned  to  man  i. 
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3.  v[jj  denotes  the  dual  variable  associated  with  job  j. 
procedure  COLUMN  REDUCTION 
begin 

1.  x[i]  *-  0,  i  =  1,  n; 

2.  for  j  =  1,  n 

3.  H  min  {c[i,  j]  :  i  =  1,  n}; 

4.  v[j]  *-  n  and  let  i*E  {i:  c[i,  j]  =  (i  }; 

5.  if  x[i*]  =  0,  then  x[i*]  j,  y[j]  «-  i*; 

6.  end  for 

7.  for  i  =  1,  n 

8.  if  x[i]  *  0,  then 

9.  fi*~  min  {c[i,  j]  -  v[j] :  j  =  1 . n  and  j 

10.  v[x[i] ]  *-  v[x[i]]  -  n\ 

11.  end  if 

12.  end  for 
end 

procedure  AUGMENTING  ROW  REDUCTION 
begin 

13.  1  —  0,  t  —  0; 

14.  for  i=l,  ...,  n 


*  x[i]} 
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15.  if  x[i]  =  0,  then  1  «-  1+1,  f [1]  «-  i; 

16.  end  for 

17.  if  1  =  0,  then  terminate  with  an  optimum; 

18.  m  *-  1,  k  1,  1  *-  0; 

19.  i  «-  f[k],  k  —  k+1; 

20.  Uj  —  min{c[i,  j]  -  v[j]:  j=l,  ....  n},  let  ji  S{j:  c[i,  j]  -  v[j]  =  Uj}, 

u2  —  min{c[i,  j]-  v[j]:  j=l . n  and  j  *  ji  >,  let  j2  £  {j:  c[i,  j]  -  v[j]  =  u2 

and  j  *  ji }; 

21.  ii  *-  y[  ji  ]; 

22.  if  ux  <  u2  ,  then  v[  ]i  ]*-  v[  j!  ]  +  Uj  -  u2  ; 

23.  else  if  ij  =  0,  then  go  to  26,  else  jj  *-  j2,  i3  *~y{  jj]; 

24.  if  ii  =  0,  then  go  to  26; 

25.  if  Ui  <  u2 ,  then  k  *-k-l,  f[k]  *-  ij ;  else  1  *-1+1,  f[l]«-  ii ; 

26.  x[i]  —  ji,  y[  ji  ]*-  i; 

27.  if  k  S  m  go  to  19; 

28:  t  —  t+1; 

29.  if  1  >  0  and  t  <2,  then  go  to  18,  else  continue  with  procedure  BUILD  A  TREE 
end 
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procedure  BUILD  A  TREE 


begin 

30.  m  *- 1 

31.  for  1  =  1 . m 

32.  i*  -  f[l]; 

33.  READY  —  <l>,  TODO  «-  {1,  n},  RSINK  «-  {j:  >’[j]  =  0); 

34.  d[j]  —  c[i*.  j]  -  v[j],  pred [j]  *-i*.  j=l . n; 

35.  fi  -  min{d[j]:  jG  TODO},  SCAN  —  {j:  d[j]  =  H,  jG  TODO},  TODO 
TODOXSCAN; 

36.  if  SCAN  0  RSINK  *  <&,  then  go  to  45; 

37.  for  all  ji  £  SCAN 

38.  i  —  y[  ji  ],  h  *-  c[i,  ji  ]  -  v[  jj  ]  -  //; 

39.  for  all  jG  TODO 

40.  p  —  c[i,  j]  -  v[j]  -  h; 

41.  if  p  <  d[j],  then  d[j]  —  p,  predfj]  *~i; 

42.  end  for 

43.  end  for 

44.  go  to  35 

45.  v[j]  «-  v[j]  +  d[j]  -  fi,  for  all  jG  READY; 
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46. 


let  jE  SCAN  n  RSINK 


47.  i  —  pred[j],  y[j]  —  i,  k  *-j,  j  x[i],  x[i]  —  k; 

48.  if  i  5*  i*  go  to  47; 

49.  end  for 
end 

Steps  3,  9,  20  and  39  are  the  most  computationally  expensive  and  it  is  precisely  these 
steps  which  have  been  parallelized. 

The  dense  assignment  code  of  Jonker  and  Volgenant  [1987]  has  been  parallelized 
using  this  strategy  and  run  on  a  Symmetry  S81  from  Sequent  Computer  Systems,  Inc. 
This  Symmetry  S81  is  a  multiprocessor  system  with  32  Mbytes  of  shared  memory  and 
twenty  Intel  80386  cpu’s.  For  this  study,  both  codes  used  only  integer  arithmetic  and  did 
not  make  use  of  math  co-processors. 

The  empirical  analysis  with  the  parallel  code  is  presented  in  Table  2  with  the  corre¬ 
sponding  speedups  given  in  Table  3.  Each  time  is  the  average  for  five  runs.  Note  that  the 
speedup  for  the  parallel  code  using  a  single  processor  ranged  from  a  high  of  0.87  to  a  low 
of  0.73.  This  implies  that  the  overhead  associated  with  parallel  processing  for  this  code 
ranged  from  13%  to  27%.  As  expected  the  speedups  increased  as  the  problem  size 
increases.  For  the  largest  problems,  speedups  of  approximately  four  were  achieved  using 
ten  processors. 

We  have  also  experimented  extensively  with  parallel  versions  of  the  auction  algorithm, 
but  our  results  were  not  competitive  with  those  presented  in  Table  2.  We  also  developed 
a  modification  of  the  shortest  augmenting  path  code  which  used  a  Dijkstra  two-tree  short¬ 
est  path  algorithm  (see  Helgason,  Kennington  and  Stewart  [1988]),  but  that  system  was 
not  competitive  with  the  original  shortest  augmenting  path  implementation.  At  the  termi- 
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nation  of  the  classical  Dijkstra  shortest  path  algorithm,  all  the  information  required  to 
update  the  duals  is  available  and  the  dual  update  can  be  executed  very  efficiently.  The 
two-tree  Dijkstra  method  can  obtain  the  shortest  augmenting  path  faster  than  the  classical 
Dijkstra  shortest  path  method;  however,  additional  work  is  required  to  discover  which 
duals  must  be  changed  and  by  what  amount.  The  overhead  required  for  the  dual  variable 
updates  exceeded  the  potential  benefits  of  the  two-tree  Dijkstra  method  for  finding  the 
shortest  augmenting  path. 

For  the  problem  sizes  and  cost  ranges  analyzed,  the  times  in  Table  2  are  the  best 
times  that  we  have  seen.  The  parallel  shortest  augmenting  path  code  is  a  powerful  tool 
that  can  easily  solve  all  1,000,000  arc  dense  assignment  problems  in  less  than  seventeen 
seconds  using  six  processors.  The  1,000,000  arc  dense  assignment  problem  with  a  cost 
range  of  [1,1000]  required  over  five  minutes  for  the  parallel  auction  code  of  Barr  and 
Christiansen  [1989]  using  six  processors. 


IV.  SUMMARY  AND  CONCLUSIONS 

The  empirical  analysis  presented  in  this  study  indicates  that  for  dense  assignment 
problems  having  a  size  up  to  800x800,  the  shortest  augmenting  path  software  is  faster 
than  the  auction  algorithm  software.  This  conclusion  was  based  on  test  runs  with  sixteen 
randomly  generated  test  problems  with  four  different  cost  ranges  and  run  on  three  differ¬ 
ent  serial  machines.  Contrary  to  the  widely  held  belief  that  the  auction  algorithm  per¬ 
forms  worse  as  the  cost  range  increases,  we  found  this  not  to  be  the  case.  We  believe  that 
Professor  Bertsekas’  latest  implementation  (Version  1.0,  June  1988)  has  eliminated  this 
difficulty.  We  did  observe  the  difficulty  with  the  ’’end  game”  in  which  an  inordinate 
amount  of  time  is  required  to  complete  the  last  few  assignments.  The  shortest  augment¬ 
ing  path  method  has  the  attractive  feature  that  each  time  a  shortest  path  is  calculated,  one 
new  assignment  is  made.  We  found  that  the  shortest  augmenting  path  code  was  adversely 
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affected  by  an  increasing  cost  range.  As  the  cost  range  increases,  larger  trees  must  be 
developed  by  Dijkstra’s  algorithm  to  obtain  the  shortest  path  from  an  unassigned  man  to 
an  unassigned  job. 


With  only  a  moderate  amount  of  code  development,  we  parallelized  the  shortest  aug¬ 
menting  path  code  of  Jonker  and  Volgenant  [1987]  for  the  Sequent  Symmetry  S81. 
Speedups  of  approximately  four  were  achieved  on  1200x1200  dense  problems  using  ten 
processors.  Remarkably,  1,000,000  arc  dense  assignment  problems  were  solved  using 
this  parallel  code  in  less  than  fifteen  seconds  (wall  clock  time).  Even  though  this  code 
was  developed  for  a  particular  multiprocessor  system  with  shared  memory,  it  can  be  used 
with  any  shared  memory  parallel  processing  system. 
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Table  1.  Comparison  of  Sequential  Algorithms  on  Dense  nxn 
Assignment  Problems  (all  times  are  in  seconds) 
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Table  2.  The  Parallel  Shortest  Augmenting  Path  Code  (all  times 
are  in  seconds  for  dense  n  x  n  assignment  problems) 
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ABSTRACT 


The  objective  of  this  investigation  was  to  analyze  simplex  based  algorithms  for  the  gener¬ 
alized  network  problem  in  both  the  sequential  and  parallel  computational  environment.  In 
comparisons  with  MPSX,  it  was  shown  that  a  generalized  network  code  is  ten  times  faster 
than  this  general  linear  programming  system.  It  w'as  also  shown  that  relaxing  the  restric¬ 
tion  that  at  least  one  of  the  multipliers  associated  with  an  arc  be  one  (minus  one),  results 
in  an  additional  computational  expense  of  ten  percent.  A  parallel  asychronous  primal 
simplex  algorithm  was  developed  and  tested  on  a  Sequent  Symmetry  S81.  Test  problems 
having  two  thousand  nodes  and  fifty  thousand  arcs  were  solved  in  from  three  to  four 
minutes  using  a  single  cpu  and  in  less  than  two  minutes  using  eight  cpu’s. 
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I.  INTRODUCTION 


The  generalized  network  problem  (also  called  the  flow  with  gains  models  in  its  most 
general  form  is  defined  as  follows: 

minimize  cx  (1) 

s.t.  Gx  =  r  (2) 

0<  x<  u,  (3) 

where  G  is  an  mxn  matrix  having  at  most  two  nonzero  entries  in  each  column,  c  is  a 
lxff  vector  of  costs,  r  is  an  mxl  vector  of  right-hand-sides,  and  u  is  an  nxl  vector  of 
upper  bounds.  Many  authors  place  the  additional  restriction  on  (1)— (3)  that  at  least  one 
of  the  nonzero  entries  in  each  column  of  G  be  a  one  (minus  one).  Not  all  instances  of 
(1)— (3)  can  be  scaled  to  meet  this  restriction  and  the  new  code  developed  for  this  investi¬ 
gation  does  not  require  this  restriction.  Associated  with  each  matrix  G  is  a  graph  [V,E], 
where  V  is  a  set  of  nodes  and  E  is  a  set  of  pairs  of  nodes  (edges).  The  nodes  correspond 
to  the  rows  or  G  and  the  edges  correspond  to  columns  of  G.  If  the  k,h  column  of  G, 
G(k),  has  a  single  nonzero  entry  in  row  i,  then  the  corresponding  edge  is  denoted  by  (i,i). 
If  G(k)  has  two  nonzero  entries  in  rows  i  and  j,  then  the  corresponding  edge  is  denoted  by 

O.j)- 

1.1.  Survey  of  Literature 

The  graphical  structure  of  a  basis  for  G  allows  the  use  of  labeling  procedures  for  basis 
representation.  Glover,  Klingman,  and  Stutz  [1973]  developed  the  first  specialized  primal 
simplex  code  (NETG)  which  exploited  this  graphical  structure.  Many  theoretical  and 
computational  improvements  have  been  made  to  this  system  over  the  last  fifteen  years 
(see  Glover,  Hultz,  Klingman,  and  Stutz  [1978]  and  Elam,  Glover,  and  Klingman  [1979]). 
A  similar  implementation  was  also  developed  by  Langley  [1973].  Adolphson  and  Heum 
[1981]  presented  computational  results  with  their  generalized  code  which  used  an  exten¬ 
sion  of  the  threaded  index  method  of  Glover,  Klingman,  and  Stutz  [1974],  Brown  and 
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McBride  [1984]  presented  the  details  of  their  generalized  network  code  (GENNET)  and 
offered  the  source  code  to  our  university  for  a  nominal  price.  Tomlin  [1984]  developed 
the  first  assembly  language  code  which  is  part  of  Ketron’s  MPS  lU  system.  Recently, 
other  codes  have  been  developed  by  Enquist  and  Chang  [1985],  Mulvey  and  Zenios 
[1985],  and  by  Ali,  Chames,  and  Song  [1986].  The  first  parallel  generalized  code  was 
developed  by  Chang,  Enquist,  Finkel,  and  Meyer  [1987]  for  the  Wisconsin  Crystal  multi¬ 
computer  and  the  second  by  Clark  and  Meyer  [1987].  The  first  C  language  code  was 
developed  by  Nulty  and  Trick  [1988].  Another  assembly  language  code  has  been  devel¬ 
oped  by  Chang,  Chen,  and  Chen  [1988].  The  code  discussed  in  this  paper  is  one  of  eight 
developed  by  Muthukrishnan  [1988].  A  summary  of  the  available  softw-are  may  be  found 
in  Table  1. 

1.2.  Parallel  Computing  Machines 

The  UNTV  AC  1,  which  appeared  in  1951,  was  the  first  commercially  produced  com¬ 
puter.  Four  generations  of  computer  development  have  since  passed  w'ith  each  generation 
exhibiting  major  improvements  over  the  previous  one.  These  improvements  were  made 
possible  in  part  by  the  introduction  of  greater  parallelism  at  all  levels  of  the  computer 
architecture.  The  first  generation  machines  were  built  with  vacuum  tubes  and  electrome¬ 
chanical  relays.  The  second  generation  machines  were  build  with  discrete  diodes,  transis¬ 
tors,  and  printed  circuit  boards.  Integrated  circuits  were  introduced  in  the  third  genera¬ 
tion  machines  and  large  scale  integration  is  used  in  the  fourth  generation  machines. 

The  first  significant  work  in  the  area  of  parallel  processing  occurred  in  the  1970’s  with 
the  development  of  the  C.mmp  at  Carnegie  Mellon  University.  In  early  digital  computers, 
the  memory  and  the  central  processing  unit  were  made  of  different  materials.  Memory 
was  cheaper  than  processing  and  the  early  parallel  machines  consisted  primarily  of  mini¬ 
computers  with  small  address  spaces  and  limited  computing  power.  Flence,  they  were 
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fairly  expensive  slow  machines  compared  to  a  single  processor  mainframe.  A  technologi¬ 
cal  breakthrough  made  possible  the  fabrication  of  both  the  memory  and  the  cpu  from  the 
same  material.  The  cost  of  the  processors  was  reduced  and  it  became  possible  in  the 
1980’s  to  build  inexpensive  multiprocessor  machines.  Now  over  twenty  different  parallel 
computers  are  commercially  available. 

Parallel  machines  are  classified  according  to  the  operational  characteristics  of  both 
instruction  and  data  streams.  The  stream  characteristics  are  categorized  as  follows: 
single  instruction  stream  (SI),  multiple  instruction  stream  (MI),  single  data  stream  (SD) 
and  multiple  data  stream  (MD).  In  SISD  machines,  a  single  stream  of  instructions  oper¬ 
ates  on  a  single  stream  of  data.  This  is  the  classical  von  Neuman  architecture  which  is 
used  in  the  following  machines:  VAX  11/780,  IBM  3081,  and  Cray-1.  Machines  which 
use  array  processors,  such  as  the  CDC  Cyber  205  and  IBM  3090-600,  and  The  Connec¬ 
tion  Machine  CM-1  with  64,000  one  bit  processor  are  called  SIMD  machines.  From  an 
algorithm  designers  point  of  view,  the  most  interesting  machines  are  MIMD  machines.  In 
these  machines,  each  processor  can  execute  different  instructions  on  different  data  seg¬ 
ments  simultaneously.  Among  others,  the  Sequent  Symmetry  S81,  Encore  Multimax, 
Intel  iPSC  hypercube,  FPS  T-20  hypercube.  Butterfly,  Alliant  FX/8  and  AT&T  KORBX 
belong  to  this  category. 

MIMD  machines  can  be  further  classified  as  either  multiprocessors  or  multicomputers. 
Multiprocessors  have  both  private  and  shared  memory  while  multicomputers  have  only 
private  memory.  Hence,  communication  among  cpu’s  on  a  multicomputer  is  through  the 
slower  procedure  of  message  passing.  Within  the  class  of  multiprocessors  further  distinc¬ 
tion  can  be  made  depending  on  the  processor  interconnection  pattern  as  tightly  coupled 
and  loosely  coupled  machines.  The  new  hardware  developed  by  Alliant,  Cray,  Encore, 
and  Sequent  are  tightly  coupled  shared  memory  MIMD  machines. 
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1.3.  Objective  of  the  Investigation 

The  primary  objective  of  this  investigation  was  to  develop  and  computationally  test  a 
parallel  primal  simplex  based  algorithm  for  the  generalized  network  problem.  This  model 
was  selected  for  investigation  due  to  the  fact  that  the  basis  for  a  generalized  network 
problem  decomposes  naturally  into  a  block  diagonal  structure  which  can  be  easily  main¬ 
tained  and  updated  using  graphical  data  structures  (labeling  procedures).  This  study 
begins  with  the  excellent  generalized  network  code,  GENNET,  developed  by  Brown  and 
McBride  [1984], 
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II.  PARALLEL  CONSTRUCTS 

Parallel  algorithms  use  modules  (subroutines)  which  may  be  executed  in  parallel. 
Suppose  there  are  r  processors  available  for  use.  The  S£I  procs  parallel  programming 
construct  generates  r  -1  clones  of  the  running  process  and  places  them  in  a  wait  state. 
The  parallel  operations  are  initiated  by  the  main  program  using  statements  of  the  form: 

for  processors  =  1  to  t  ,  fork  module  WORK. 

The  main  program  and  the  r  -1  clones  each  execute  module  WORK.  Processing  in  the 
main  program  continues  only  after  all  processors  complete  execution  of  WORK  and  the 
clones  return  to  a  wait  state  until  the  next  fork  is  executed.  In  order  to  allow  for  mutual 
exclusion  of  certain  sections  of  code,  variables  can  be  designated  as  locks.  Variables  so 
designated  can  assume  two  states:  locked  (1)  or  unlocked  (0).  Locked  sections  of  code 
appear  as  follows: 

lock  [s] 


unlock  [s] . 

If  a  process  reaches  a  lock  statement  and  s=0,  then  it  sets  s  to  1  and  continues.  Other¬ 
wise,  the  process  spins  until  s=0,  then  it  sets  s  to  1  and  continues.  When  a  process 
reaches  an  unlock  statement  it  sets  s  to  0.  Two  processors  can  never  execute  the  code 
between  the  lock  and  unlock  statements  simultaneously. 
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III.  PARALLEL  SIMPLEX  FOR  GENERALIZED  NETWORKS 
The  best  algorithm  for  solving  the  generalized  network  problem  is  a  specialization  of 
the  primal  simplex  method  which  exploits  the  underlying  graphical  structure  of  the 
model.  By  a  rearrangement  of  rows  and  columns,  every  basis  for  (l)-(3)  can  be  placed  in 
the  block  diagonal  form 

Bj 

B2 

Bp_ 

where  each  Bj  ,  i=l,  p,  is  either  lower  triangular  or  nearly  lower  triangular  with  only 
one  element  above  the  diagonal.  Furthermore,  each  component  B;  corresponds  to  a  con¬ 
nected  graph  and  all  the  simplex  operations  can  be  carried  out  directly  on  this  graph.  The 
simplex  algorithm  for  this  model  in  its  most  general  form  is  presented  below: 

Primal  Simplex  Algorithm  for  Generalized  Networks 

Input: 

1.  A  graph  [V,E]. 

2.  A  cost  c[e]  and  arc  capacity  u[e]  for  each  e  G  E. 

3.  The  generalized  constraint  matrix  G. 

4.  A  requirement  r[n]  for  all  n  G  V. 

Output: 

1.  The  termination  type  indicator  /?  and  flow  3T  [e]  for  all  e  G  E.  (j3  =1  implies  that 
the  problem  is  unbounded,  /S  =  2  implies  that  the  problem  has  no  feasible  solution, 
and  /?  =  3  implies  that  the  optimal  solution  is  given  in  x  [e]  for  all  e  G  E.) 
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Working  Entities: 

1.  An  array  n  [n],  the  dual  variable  for  each  n  G  V. 

2.  An  array  y[n],  the  component  of  the  updated  column  for  each  n  G  V. 

3.  A  set  Eb  of  basic  arcs. 

4.  A  matrix  B  of  columns  of  G  corresponding  to  EB. 

Procedure  SIMPLEX 

begin 

1.  let  Eb  C  E  denote  the  set  of  basic  arcs  with  corresponding  basis  B,  and  let 

x  [e]  denote  the  flows  for  all  e  G  E; 

2.  /S-0; 

3.  tt  *-  cbB~\  where  cB  denotes  the  costs  associated  with  EB; 

4.  call  module  PRICE; 

5.  if  s* 0.  then  terminate; 

6.  cali  'ule  RATIO; 

7.  if  *0,  then  terminate; 

8.  call  module  UPDATE; 

9.  go  to  4. 

end. 

Procedure  PRICE 
begin 

1.  Pl=  {j  :  *GQ)-c[j]  >0,5t{j]  =0,j  G  E\EB}; 

2.  Pus  11 :  ^GG)-c[j]  <  0, 5T[j]  =  u[j],j  G  E\Eb}; 

3.  if  Pl  U  Pu  =  0.  then 

4.  if  Eb  contains  an  artificial  variable  having  flow  >  0,  then  *-2;  otherwise, 
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else 

5.  select  k  E  PL  U  Pu: 

6.  if  k  £  PL,  then  <5  «-l;  otherwise,  <5  *--l; 
end  if 

end. 

Procedure  RATIO 


1. 

2. 

3. 

4. 

5. 


begin 

y  -  B-’G(k) 

A]  -  min  {-^4,  <*  :  cr(y[i])  =  <5,  GQ)  =  B(i),  i  =  1, ....  |V|}; 

:-ff(y(i]).(S.GG)  =  B(i),  i-1 . 

A  «-  min  {Aj,  A2,  u[k]}; 
if  A  =  «  ,  then  (3  *-  1; 


end. 

Procedure  UPDATE 


Degin 

1.  xfk]  *-  xfk]  +  Ad 

2.  5T{j)  *-  xfj]  -  Adyfi]  for  i-1 . | V|  and  G(j)  =  B(i); 

3.  if  A  *  u[k],  then 

4.  UL  =  {j  :  xfj]  =  0,  cr(y[i])  =  6 ,  G(j)  =  B(i)}; 

5.  Uu  =  (j  :  xfj]  =  u[j],  -  a(y[ij)  =  d,  G(j)  =  B(i)}; 

6.  select  r  6  UL  ]J  Uu 


|V|} 
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7.  Eb  *-  (Eb  U  {k})\{r}  and  update  B  so  that  it  corresponds  to  Eb. 

8.  7T «- cB  B~\  where  cB  denotes  the  costs  associated  with  Eb; 

9.  end  if 
end 

Since  the  components  of  the  partitioned  basis 

B, 

B2 

Bp_ 

correspond  to  connected  components  of  a  graph,  the  partitioning  is  easily  maintained 
after  a  pivot  is  performed.  Since  each  column  of  G  has  at  most  two  nonzero  entries,  a 
simplex  pivot  can  affect  at  most  two  of  the  p  partitions.  Therefore,  multiple  processors 
can  be  performing  pivot  operations  simultaneously  on  different  partitions  of  the  basis.  In 
this  parallel  implementation  of  the  simplex  algorithm,  all  processors  execute  the  simplex 
method  asynchronously  and  a  description  of  the  method  requires  only  an  explanation  of 
the  operation  of  a  single  processor. 

For  a  processor,  the  pricing  module  is  called  and  some  edge  which  prices  favorably  is 
selected  for  flow  change.  If  the  component(s)  associated  with  this  edge  are  locked,  then 
the  processor  returns  to  the  pricing  module.  Otherwise,  the  component(s)  are  locked  and 
the  candidate  edge  is  epriced.  If  the  candidate  edge  prices  unfavorably  during  the  repric¬ 
ing  stage,  the  component(s)  are  unlocked  and  the  processor  returns  to  the  pricing  module. 
Pivoting  is  executed  only  if  the  candidate  edge  prices  favorably  w'hen  repriced.  After 
returning  from  the  update  module,  the  affected  components  are  unlocked.  The  parallel 
simplex  algorithm  specialization  in  its  most  general  form  is  presented  below: 
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Parallel  Primal  Simplex  Algorithm  for  Generalized  Networks 


Input: 

1.  A  graph  [V,E]. 

2.  A  cost  c[e],  arc  capacity  u[e],  from  (tail)  node  f[e],  and  to  (head)  node  t[e]  for  all 
e  €  E. 

3.  The  generalized  constraint  matrix  G. 

4.  A  requirement  r[n]  for  all  n  E  V. 

5.  The  number  of  processors,  t  ,  available  for  use. 

Output: 

1.  The  termination  type  indicator  and  flow  x  [e]  for  all  e  €  E.  (fi  =1  implies  that 
the  problem  is  unbounded,  /S  =2  implies  that  the  problem  has  no  feasible  solution,  and 
P  =3  implies  that  the  optimal  solution  is  given  in  x[e]  for  all  e  E  E.) 

Working  Entities: 

1.  An  array  *r[n],  the  dual  variable  for  each  n  E  V. 

2.  An  array  y[n],  the  component  of  the  updated  column  for  each  n  E  V. 

3.  A  set  Eb  of  basic  arcs. 

4.  A  matrix  B  of  columns  of  G  corresponding  to  Eb- 

5.  A  variable  s  to  be  used  as  a  lock. 

6.  An  array  comp[n]  which  gives  the  component  number  in  which  n  E  V  is  a 
member. 

7.  An  array  lockc[m]  which  has  the  value  of  one  if  component  m  is  busy  and  zero, 
otherwise. 
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Procedure  PARALLEL  SIMPLEX 


begin 

1.  let  EB  C  E  denote  the  set  of  basic  arcs  with  corresponding  basis  B,  and  let  x  [e] 
be  the  flows  for  all  e£E; 

2.  lockc[n]«-0  for  all  n  E  V; 

3.  let  B  be  partitioned  into 

B, 

B2 

Bp_ 

corresponding  to  the  connected  components  of  [V,  Eb]  and  set  comp[n]  equal 
the  partition  number  for  which  node  (row)  n  is  a  member; 

4.  ££i  procs  r 

5.  for  processors  =  1  to  r  ,  fork  module  SOLVER; 
end. 

Procedure  SOLVER 

local  data:  /?,  k,  <5,  A,  f,  t,  UL,  Uu,r 

begin 

1.  £-0; 

2.  7t*~  CbB'1,  where  Cb  denotes  the  costs  associated  with  Eb; 

3.  call  module  PRICE; 

4.  if  *  0,  then  return; 

5.  lock  s 

6.  if  lockc[comp[f[k]]]  =  0  and  lockc(comp[t[k)]]  =  0,  then; 
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7.  (comment:  lock  components  associated  with  the  entering  arc) 

8.  lockc[comp[f[k]]j  *-  1; 

9.  lockc[comp[t[k]]]  *-  1; 

(comment:  reprice  entering  arc) 

10.  if  (xfk]  =  0  and  ^rOG)  -  c[j]  <  0)  or  (x{k]  =  u[k]  and  7rG(j)-c[j]  s  0) 
then 

(comment:  entering  arc  does  not  price  favorable,  some  other  processor 
modified  dual  after  pricing) 

11.  lockc[comp[f[k]]]  *-  0; 

12.  lockc[comp[t[k]]]*- 0; 

13.  unlock  [s]; 

14.  go  to  3; 

15.  end  if 

16.  unlock  s; 

17.  else 

18.  unlock  s; 

19.  go  to  3; 

20.  end  if 

21.  call  module  RATIO; 

22.  if  ft  *  0,  then  return; 

23.  F«-comp[f[k]j]t  f«-comp[t(k)]]; 

24.  call  module  UPDATE; 

25.  update  comp[n]  for  all  n  6  Vto  correspond  to  Eb; 

26.  lockc[f  ]*-0,  lockc[t  ]•*—  0; 

27.  go  to  3; 

end 
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IV.  COMPUTATIONAL  EXPERIENCE 

This  investigation  began  with  the  excellent  generalized  network  code  of  Brown  and 
McBride  [1984]  known  as  GENNET.  This  code  requires  that  the  problem  be  scaled  so  that 
at  least  one  of  the  nonzero  entries  in  every  column  of  G  must  be  one.  This  restriction  has 
two  disadvantages,  (i)  it  means  that  not  all  instances  of  (1)— (3)  can  be  solved  with  this 
code  and  (ii)  it  is  awkward  to  use  this  code  as  the  continuous  relaxation  solver  within  a 
branch-and-bound  framework  for  integer  generalized  networks.  That  is,  scaling  an  inte¬ 
ger  variable  prior  to  application  of  an  integer  solver  requires  special  handling  of  that 
integer  variable.  The  modified  code  which  allows  for  arbitrary  nonzero  entries  in  each 
column  of  G  is  called  GENFLO. 

The  computational  experience  comparing  MPSX  (the  IBM  proprietary  mathematical 
programming  system),  NETFLO  (Kennington  and  Helgason  [1980]),  GENNET  (Brown 
and  McBride  [1984]),  and  GENFLO  on  pure  network  flow  problems  may  be  found  in 
Table  2.  The  problem  number  refers  to  the  NETGEN  numbers  (see  Klingman,  Napier, 
and  Stutz  [1974]).  These  problems  were  all  run  on  an  IBM  3081-D24.  The  three  special¬ 
ized  codes  are  written  in  FORTRAN  and  used  the  FORTVS  compiler  with  OPT=2.  All 
times  are  in  cpu  seconds  and  exclude  input  and  output. 

NETFLO  assumes  that  every  column  of  G  has  two  nonzero  entries  w'hich  are  one  and 
minus  one.  GENNET  assumes  that  every  column  has  at  most  two  nonzero  entries  and 
one  must  be  one.  GENFLO  assumes  that  every  column  has  at  most  two  nonzero  entries 
and  MPSX  makes  no  assumptions  about  the  entries  in  G.  For  these  fourteen  test  prob¬ 
lems,  NETFLO  is  sixty  times  faster  than  MPSX,  GENNET  is  fifty  times  faster,  and 
GENFLO  is  forty  times  faster. 

The  computational  experience  comparing  MPSX,  GENNET,  and  GENFLO  on  general¬ 
ized  i.etwork  problems  may  be  found  in  Table  3.  The  test  problems  were  generated  by  a 
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modification  of  NETGEN  called  GNETGEN  developed  by  Professor  Klingman  and  col¬ 
leagues  at  the  University  of  Texas  at  Austin  and  loaned  to  us  by  Professor  Glover  and 
colleagues  at  the  University  of  Colorado  in  Boulder.  Both  specialized  codes  are  ten  times 
faster  than  MPSX.  The  arbitrary  second  multiplier  in  GENFLO  results  in  a  computational 
expense  of  approximately  ten  percent. 

GENFLO  has  been  parallelized  using  the  asychronous  parallel  simplex  method  pre¬ 
sented  in  Section  3  and  run  on  the  two  parallel  multiprocessor  systems  produced  by  Se¬ 
quent  Computer  Corporation.  The  computational  times  are  presented  in  Table  4  and  the 
speedups  are  presented  in  Table  5.  Test  problems  G1  through  G4  are  grid  problems  and 
R1  through  R4  are  random  problems  generated  by  GNETGEN.  Problem  Cl  was  gener¬ 
ated  with  a  modification  of  GNETGEN  designed  to  produce  problems  with  a  large  number 
of  components  at  optimality.  The  change  involved  modifying  the  cost  range  so  that  the 
cost  for  random  arcs  dominated  the  cost  for  the  arcs  in  the  skeleton  network.  Problems 
generated  using  this  cost  structure  yielded  a  large  number  of  components  (partitions)  in 
the  optimal  basis.  The  column  entitled  “components  at  optimality”  gives  the  number  of 
partitions  in  the  optimal  basis.  All  runs  began  with  an  all  artificial  start.  Since  the 
asychronous  simplex  algorithm  follows  a  different  path  in  each  parallel  run,  the  times 
(and  number  of  iterations)  for  solving  a  given  problem  may  vary  substantially.  To  ac¬ 
count  for  variability,  each  problem  was  solved  three  times  and  the  average  is  reported. 
The  speedups  varied  from  approximately  two  to  three  using  eight  processors. 
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V.  SUMMARY  AND  CONCLUSIONS 


The  development  of  relatively  inexpensive  parallel  computers  has  generated  wide¬ 
spread  interest  in  the  development  of  new  optimization  algorithms  for  such  machines. 
Although  parallel  computers  come  in  a  variety  of  architectures,  the  popularity  of  multi¬ 
ple-instruction  multiple-data  (MIMD)  machines  can  be  attributed,  in  part,  to  the  ease 
with  which  codes  intended  for  sequential  computers  can  be  ported  to  these  machines.  In 
this  study,  an  asychronous  simplex  method  for  the  generalized  network  problem  has  been 
implemented  and  computationally  tested  on  a  Sequent  Balance  21000  and  a  Sequent  Sym¬ 
metry  S81.  A  speedup  of  approximately  three  was  achieved  on  a  90x90  grid  problem 
having  8100  nodes  and  16,020  arcs  using  eight  cpus.  Random  problems  having  2000 
nodes  and  50,000  arcs  yielded  speedups  of  approximately  two  using  eight  cpus. 
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Table  1 .  Generalized  Network  Codes 


Code 

Language 

Authors 

Year 

NETG 

FORTRAN 

Glover,  F. 

Klingman,  D. 

Stutz,  J. 

1973 

FORTRAN 

Langley,  W. 

1973 

FORTRAN 

Adolphson,  D. 

Heum,  L. 

1981 

GENNET 

FORTRAN 

Brown,  G. 

McBride,  R. 

1984 

GWHIZNET 

ASSEMBLER 

Tomlin,  J. 

1984 

GRNET 

FORTRAN 

Enquist,  M. 

Chang,  M. 

1985 

LPNETG 

FORTRAN 

Mulvey,  J. 

Zenios,  S. 

1985 

FORTRAN 

AH,  1. 

Charnes,  A. 

Song,  T. 

1986 

FORTRAN 

Chang,  M. 

Enquist,  M. 

Finkel,  R. 

Meyer,  R. 

1987 

PGRNET 

FORTRAN 

Clark,  R. 

Meyer,  R. 

1987 

GNO/PC 

C 

Nulty,  W. 

Trick,  M. 

1988 

GRNET- A 

ASSEMBLER 

Chang,  M. 

Chen,  M. 

Chen  C. 

1988 

GENFLO 

FORTRAN 

Muthukrishnan,  R. 

1988 

Table  2.  Solution  Times  for  Pure  Network  Problems 
(all  times  are  in  seconds  on  an  IBM  3081 D) 
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Table  3.  Solution  Times  for  Generalized  Networks 
(all  times  are  in  seconds  on  an  IBM  3081 D 


Table  4.  Solution  Times*  for  the  Parallel  Simplex  Solver 
(times  are  in  seconds) 
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'Problems  G1-G4  were  run  on  a  Sequent  Balance  21000,  all  other  problems  were  run  on  a 
Sequent  Symmetry  S81. 


Table  5.  Speedups  for  the  Parallel  Simplex  Solver 
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